Barcoding sequences for identification of gene expression

ABSTRACT

Gene expression can be identified by analyzing a DNA sequence. The DNA sequence can include a barcode sequence that corresponds to a particular gene. The barcode sequence can be produced during the expression of a gene by first adding a Homologous Directed Repair (HDR) template including the barcode sequence into the DNA sequence of the gene and then splicing the barcode sequence out of an RNA precursor during the expression of the gene. As the barcode sequence is made available from the RNA precursor, it can be added to the DNA strand using HDR. The resulting DNA strand can be sequenced and the sequence data can be analyzed to identify the barcode sequence within the DNA sequence, which provides an indicator of the expression of the gene in DNA rather than RNA.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a division of U.S. Patent Application PublicationNo. U.S. 2018/0002748 filed on Jun. 16, 2017, entitled “BarcodingSequences for Identification of Gene Expression” which claims thebenefit of U.S. Provisional Patent Application Ser. No. 62/357,828 filedon Jul. 1, 2016, entitled “Storage Through Iterative DNA Editing,” U.S.Provisional Application Ser. No. 62/399,190 filed on Sep. 23, 2016,entitled “Storage Through Iterative DNA Editing,” and U.S. ProvisionalApplication Ser. No. 62/487,671 filed on Apr. 20, 2017, entitled“Mechanisms for Molecular Event Logging.” This application is related toU.S. Patent Application Publication No. U.S. 2018/0004537, entitled“Molecular State Machines” and U.S. Pat. No. 10,892,034 entitled “Timingof Logged Molecular Events” both filed on Jun. 16, 2017. All patents andpatent applications included in this paragraph are expresslyincorporated herein by reference.

REFERENCE TO A SEQUENCE LISTING SUBMITTED VIA EFS WEB

The content of the ASCII text file of the sequence listing named“MS2-0133USD1_Sequence_Listing_seq_ST25.txt” which is 3 kb in size wascreated on Jun. 6, 2022 and electronically submitted via EFS web isincorporated herein by reference in its entirety.

BACKGROUND

Cells having the same genes can produce different gene productsdepending on the environment of the cell. For example, the cells of anorganism, such as a human, can have the same genes, but the genes can beexpressed in different ways under different conditions. In this way, onecell having the genes of the organism can be expressed as cell having afirst function, such as a liver cell, and another cell having the genesof the organism can be expressed as a cell having a second function,such as a muscle cell. Additionally, genes of an organism can beexpressed differently in healthy cells versus cells in a diseased state.

Typically, gene expression is monitored through the sequencing ofribonucleic acid (RNA). RNA is produced from deoxyribonucleic acid (DNA)as a template by which a gene product, such as a protein, is made. Afterthe gene product is produced, the RNA used to make the gene productdegrades and is no longer detectable after a period of time. RNAsequencing techniques can be used to detect RNA in a cell at a giventime and thus, gene expression can be determined from the RNA-sequencingprocess.

The sequencing of RNA to track gene expression has limitations because,due to the transitory nature of RNA, the expression of genes can only bemonitored at a specific point in time. Thus, tracking the expression ofa gene over time requires multiple RNA sequencing operations to beperformed over a period of time, which can increase the resources andexpense of monitoring gene expression via RNA sequencing. Additionally,the RNA sequencing operations destroy the cell being studied and do notprovide opportunities for further study of the gene expression of thecell.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter nor is it intended tobe used to limit the scope of the claimed subject matter.

Gene expression can be monitored and identified by analyzing a DNAsequence. The DNA sequence can include a barcode sequence thatcorresponds to a particular gene. In some cases, the barcode sequencecan uniquely identify the gene. When the gene is expressed, the barcodesequence can be produced and added to a DNA strand. In particular, anenzyme can produce a double strand break (DSB) at a cut site in the DNAstrand. Homologous Directed Repair (HDR) can be utilized to add thebarcode sequence into the DNA strand. The resulting new DNA strand canbe sequenced and the sequence data can be analyzed to identify thebarcode sequence within the DNA sequence.

The barcode sequence can be produced during the expression of a gene byfirst adding an HDR template to the DNA sequence of the gene. The HDRtemplate can include the barcode sequence in addition to at least onesplicing sequence. The HDR template can be inserted into a coding regionof the gene or a non-coding region of the gene, such as the 3′untranslated region (UTR) of the gene. As the gene is expressed, an RNAprecursor can be produced that includes the HDR template. A splicingenzyme can remove the non-coding portions included in the RNA precursor,which includes the HDR template. The HDR template is then available tobe added to a cut site of a DNA strand through homologous directedrepair. DNA sequencing of the DNA strand can then be used to identifythe presence of the barcode sequence in the DNA strand as an indicatorof the gene being expressed.

BRIEF DESCRIPTION OF THE DRAWINGS

The Detailed Description is set forth with reference to the accompanyingfigures. In the figures, the left-most digit(s) of a reference numberidentifies the figure in which the reference number first appears. Theuse of the same reference numbers in different figures indicates similaror identical items.

FIG. 1 shows a schematic representation of cutting dsDNA with an enzymeand inserting new DNA by HDR.

FIG. 2 shows a schematic representation of cutting the dsDNA of FIG. 1and inserting additional DNA by HDR.

FIG. 3 shows illustrative components for controlling expression of agene product based on a signaling pathway.

FIG. 4 shows illustrative components for creating a log of multiplesignals in a way that records relative signal strength.

FIG. 5 show illustrative components of a cell for inserting new DNA intoexisting dsDNA.

FIG. 6 shows a diagram illustrating insertion of a first HDR templateinto a gene.

FIG. 7 shows a diagram illustrating the splicing of a second HDRtemplate including a barcode sequence from an RNA precursor producedfrom the gene.

FIG. 8 shows a diagram illustrating insertion of the second HDR templateinto an additional polynucleotide.

FIG. 9 shows a diagram illustrating joining a first HDR template and asecond HDR template to produce a third HDR template using an RNAsubstrate.

FIG. 10 shows a diagram illustrating insertion of a portion of the thirdHDR template into an additional polynucleotide.

FIG. 11 shows an illustrative process for identifying the expression ofa gene by sequencing DNA that includes a barcode sequence correspondingto the gene.

FIG. 12 shows an additional illustrative process for identifying theexpression of a gene by sequencing DNA that includes a barcode sequencecorresponding to the gene

FIG. 13 shows a system for designing barcode sequences and utilizing thebarcode sequences to identify the expression of a gene.

DETAILED DESCRIPTION

This disclosure describes techniques to identify the expression of genesby analyzing DNA sequences rather than RNA sequences. The DNA sequencescan include barcode sequences that correspond to the genes beingexpressed. In some cases, a barcode sequence can be used to uniquelyidentify a particular gene. In this way, the presence of a barcodesequence in a DNA sequence can indicate the expression of the gene.

Typically, DNA barcoding refers to utilizing relatively short sequences(e.g., less than 800 nucleotides) already found in the genes of anorganism in the identification of the organism. Often, DNA barcodingrelies on sequences of DNA included in particular locations of a gene toclassify organisms within a taxonomy. The barcoding sequences describedherein are different from those associated with conventional DNAbarcoding because the barcoding sequences described in this applicationare introduced into the gene through intentional manipulation and arenot inherently part of the gene. The barcoding sequences describedherein correspond to arbitrary nucleotide sequences added to aparticular gene or a particular set of genes and can be utilized totrack the expression of the gene(s) in that the availability of thebarcoding sequences to be inserted into another polynucleotide is basedon the expression of the gene.

In various implementations, a first HDR process can be utilized toinsert a first HDR template into a gene for which the expression of thatgene is to be tracked. The first HDR template can include a barcodingsequence that corresponds to the gene, as well as at least one splicingsequence. The splicing sequence can correspond to a sequence ofnucleotides that is recognized by a splicing enzyme, such as aspliceosome. A spliceosome is a large and complex molecular machinefound primarily within the splicing speckles of the cell nucleus ofeukaryotic cells. The spliceosome is assembled from snRNAs and proteincomplexes. The spliceosome removes introns from a transcribed pre-mRNA,a type of primary transcript. This process is generally referred to assplicing. Only eukaryotes have spliceosomes and some organisms have asecond spliceosome, the minor spliceosome. In some implementations, thefirst HDR template can be inserted into a coding region of the gene. Thecoding region of the gene includes sequences that can directly correlateto a gene product and sequences that do not contribute to the coding ofthe gene product. The sequences that code for a gene product can bereferred to as exons, while the sequences that do not code for a geneproduct can be referred to as introns. In situations when the first HDRtemplate is inserted into a coding region of the gene, the first HDRtemplate can include two splicing sequences. In other implementations,the first HDR template can be inserted at the end of the 3′ untranslatedregion (UTR). In these instances, the first HDR template can include asingle splicing sequence.

As the gene is expressed, the RNA precursor that is produced includes atleast a 5′ UTR, exons, introns, and a 3′ UTR. The first HDR template canbe included among the introns of the RNA precursor or in the 3′ UTR. Asthe non-coding sequences are removed from the RNA precursor—to producemessenger RNA (mRNA) that includes the exons, the 5′ UTR, and the 3′UTR—the first HDR template is also removed.

In some cases, the non-coding sequences can be removed from the RNAprecursor using enzymes, such as spliceosomes. The spliceosomes canrecognize specific sequences that are referred to herein as “splicingsequences” and make a cut at certain positions within the specificsequences. Splicing sequences can have a particular arrangement thatincludes a donor site at the 5′ end of the intron, a branch site nearthe 3′ end of the intron, and an acceptor site at the 3′ end of theintron. The splice donor site includes a conserved, almost invariantsequence GU at the 5′ end of the intron, within a larger, less highlyconserved region. The splice acceptor site at the 3′ end of the intronterminates the intron with a conserved, almost invariant AG sequence.Upstream (5′-ward) from the AG there is a region high in pyrimidines (Cand U), or polypyrimidine tract. Further upstream from thepolypyrimidine tract is the branchpoint, which includes an adeninenucleotide involved in lariat formation. An example splicing sequencefor an intron in International Union of Pure and Applied Chemistry(IUPAC) nucleic acid notation can include: G-G-[cut]-G-U-R-A-G-U (donorsite) . . . intron sequence . . . Y-U-R-A-C (branch sequence 20-50nucleotides upstream of acceptor site) . . . Y-rich-N-C-A-G-[cut]-G(acceptor site), where Y indicates a pyrimidine, N corresponds to anynucleotide, and R corresponds to a purine.

After the HDR template including the barcode sequence is spliced fromthe RNA precursor, it is available to be inserted into a double strandedDNA molecule by a second HDR operation. In particular, an enzyme cancreate a DSB at a target site of the double stranded DNA molecule thatis homologous with end portions of the second HDR template. The HDRtemplate can then be inserted into the sequence of the double strandedDNA molecule. The double stranded DNA molecule can then be sequenced andanalyzed. The analysis of the sequence data from the double-stranded DNA(dsDNA) molecule can indicate the presence of the barcode sequence,which corresponds to the expression of the gene.

By utilizing the implementations described herein, the expression of agene can be identified through the sequencing of DNA instead ofconventional techniques that operate by sequencing of RNA. In this way,the expression of the gene can be more accurately determined since theanalysis is performed with respect to DNA, which is more stable and lesstransitory than RNA. Thus, rather than obtaining a snapshot of only theRNA that exists at a particular point in time, implementing thetechniques described herein can show each of the expressions of the genethat have taken place over a period of time.

Homology Directed Repair

HDR is a mechanism in cells to repair DSBs. The most common form of HDRis homologous recombination. The HDR repair mechanism can be used by thecell when there is a homologous piece of DNA present to repair the DSB.HDR is considered a highly accurate mechanism for DSB repair due to therequirement of sequence homology between the damaged and intact donorstrands of DNA. The process is nearly error-free if the DNA templateused for repair is identical to the original DNA sequence at the DSB, orit can introduce very specific mutations into the damaged DNA if thereare differences between the DNA template use for repair and the originalDNA sequence. This disclosure discusses use of a HDR template that addsa new DNA sequence at the point of the DSB as part of the repairprocess.

HDR includes homologous recombination (HR) and single-strand annealing(SSA) (Lieber. 2010 Annu. Rev. Biochem. 79:181-211). The most commonform of HDR is HR which has the longest sequence homology requirementsbetween the donor and acceptor DNA. Other forms of HDR includesingle-stranded annealing (SSA) and breakage-induced replication, andthese require shorter sequence homology relative to HR. HDR at nicks(single-stranded breaks) can occur via a mechanism distinct from HDR atDSBs (Davis and Maizels. PNAS (0027-8424), 111 (10), p. E924-E932).

The terms “homology” and “homologous” as used herein in reference tonucleotide sequences refer to a degree of complementarity with othernucleotide sequences. There may be partial homology or complete homology(i.e., identity). A nucleotide sequence which is partiallycomplementary, i.e., “substantially homologous,” to a nucleic acidsequence is one that at least partially inhibits a completelycomplementary sequence from hybridizing to a target nucleic acidsequence. The inhibition of hybridization of the completelycomplementary sequence to the target sequence may be examined using ahybridization assay (Southern or Northern blot, solution hybridizationand the like) under conditions of low stringency. A substantiallyhomologous sequence or probe will compete for and inhibit the binding(i.e., the hybridization) of a completely homologous sequence to atarget sequence under conditions of low stringency. This is not to saythat conditions of low stringency are such that non-specific binding ispermitted; low stringency conditions require that the binding of twosequences to one another be a specific (i.e., selective) interaction.The absence of non-specific binding may be tested by the use of a secondtarget sequence which lacks even a partial degree of complementarity(e.g., less than about 30% identity); in the absence of non-specificbinding the probe will not hybridize to the second non-complementarytarget.

The terms “homology” and “homologous” as used herein in reference toamino acid sequences refer to the degree of identity of the primarystructure between two amino acid sequences. Such a degree of identitymay be directed a portion of each amino acid sequence, or to the entirelength of the amino acid sequence. Two or more amino acid sequences thatare “substantially homologous” may have at least 50% identity,preferably at least 75% identity, more preferably at least 85% identity,most preferably at least 95%, or 100% identity.

By “hybridizable” or “complementary” or “substantially complementary” itis meant that a polynucleotide (e.g. DNA or RNA) comprises a sequence ofnucleotides that enables it to non-covalently bind, to anotherpolynucleotide in a sequence-specific, antiparallel, manner (i.e., apolynucleotide specifically binds to a complementary polynucleotide)under the appropriate in vitro and/or in vivo conditions of temperatureand solution ionic strength. As is known in the art,

Hybridization and washing conditions are well known and exemplified inSambrook, J., Fritsch, E. F. and Maniatis, T. Molecular Cloning: ALaboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press,Cold Spring Harbor (1989), particularly Chapter 11 and Table 11.1therein; and Sambrook, J. and Russell, W., Molecular Cloning: ALaboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press,Cold Spring Harbor (2001). The conditions of temperature and ionicstrength determine the “stringency” of the hybridization.

It is understood in the art that the sequence of polynucleotide need notbe 100% complementary to that of its target polynucleotide to bespecifically hybridizable. Moreover, a polynucleotide may hybridize overone or more segments such that intervening or adjacent segments are notinvolved in the hybridization event (e.g., a loop structure or hairpinstructure). A polynucleotide can comprise at least 70%, at least 80%, atleast 90%, at least 95%, at least 99%, or 100% sequence complementarityto a target site within the target polynucleotide sequence to which theyare targeted. For example, an antisense polynucleotide in which 18 of 20nucleotides of the antisense compound are complementary to a targetsite, and would therefore specifically hybridize, would represent 90percent complementarity. In this example, the remainingnon-complementary nucleotides may be clustered or interspersed withcomplementary nucleotides and need not be contiguous to each other or tocomplementary nucleotides. Percent complementarity between particularstretches of polynucleotide sequences within polynucleotides can bedetermined routinely using BLAST programs (basic local alignment searchtools) and PowerBLAST programs known in the art (Altschul et al., J.Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7,649-656) or by using the Gap program (Wisconsin Sequence AnalysisPackage, Version 8 for Unix, Genetics Computer Group, UniversityResearch Park, Madison Wis.), using default settings, which uses thealgorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).

FIG. 1 shows an illustrative schematic of operations to add a new DNAsequence into dsDNA 100 through HDR. The dsDNA can, in some cases, beincluded in a gene for which its expression is being monitored. ThedsDNA 100 includes a target site 102 that directs an enzyme 104 tocreate a DSB in the dsDNA 100 within the target site 102 at a specificcut site 106. The DSB may be created with blunt ends or with sticky endsdepending on the specific enzyme and technique for making the DSB. Thetarget site 102 is a sequence of DNA recognized by an enzyme thatcreates DSBs in dsDNA. By “enzyme reactive conditions” it is meant thatany necessary conditions are available in an environment (i.e., suchfactors as temperature, pH, and lack of inhibiting substances) whichwill permit the enzyme to function. Enzyme reactive conditions can beeither in vitro, such as in a test tube, or in vivo, such as within acell.

The target site 102 may be intentionally introduced into the dsDNA 100to enable the manipulations described below. Alternatively, apre-existing portion of the dsDNA 100 may be selected as the target site102. If a pre-existing portion of the dsDNA 100 is selected as thetarget site 102, then the sequence of other components of the systemwill be designed with reference to the sequence of the target site 102.In some implementations, the target site 102 is unique such that thereis only one target site 102 in the entire dsDNA strand and/or only onetarget site 102 throughout all the DNA in the cell. The dsDNA 100 may begenomic DNA inside a living prokaryotic or eukaryotic cell, DNAintroduced to a living cell such as a plasmid or vector, or DNA in acell-free system. The dsDNA 100 may exist as either linear or circularDNA prior to introduction of the DSB.

The enzyme 104 that creates the DSB may be any protein, protein-RNAcomplex, or protein-DNA complex (including multimeric complexes) thathas the property of creating a DSB in dsDNA at the cut site 106.Non-limiting examples of suitable enzymes include restriction enzymes,homing endonucleases, zinc-finger nucleases (ZFNs), transcriptionactivator-like effector nucleases (TALENs), CRISPR/Cas, and NgAgo. Thesetypes of enzymes are all examples of site-specific nucleases that arecapable of causing a DSB at a cut site 106 within a target site 102.Further details about site-specific nucleases are provided below.

After creating a DSB at the cut site 106, the target site 102 is splitinto two subsequences 102(A), 102(B) on either side of the DSB. Each ofthe two subsequences 102(A), 102(B) may, in an implementation, bebetween 5 and 20 nucleotides (nt) in length. Thus, the target site 102may, in an implementation, be between 10 and 40 nt in length. In someimplementations, the two subsequences 102(A), 102(B) may containidentical DNA sequences. The cut site 106 may be located in the middleof the target site 102 or it may be located elsewhere within the targetsite 102. The schematic shown in FIG. 1 illustrates a DSB with bluntends, but as described above DSBs with sticky ends are also coveredwithin the scope of this disclosure.

AN HDR template 108 is brought into proximity of the dsDNA 100 with theDSB. The HDR template 108 is single strand (ss) DNA or ssRNA. The HDRtemplate repairs the DSB and inserts a polynucleotide sequence throughthe process of homology directed repair. HDR templates used to createspecific mutations or insert new elements into a gene require a certainamount of homology surrounding the target site that will be modified.Thus, the HDR template 108 includes a 3′-end sequence 110 complementaryto the first subsequence of the target site 102(A) and a 5′-end sequence112 complementary to a second subsequence of the target site 102(B).Because they are complementary sequences, the length of the 3-endsequence 110 and the 5′-end sequence 112 are the same or about the sameas the respective subsequences of the target site 102(A), 102(B). Thus,both 3-end sequence 110 and the 5′-end sequence 112 may be between 5 and20 nt in length. The middle portion of the HDR template 108 contains aregion 114 encoding a second target site 116. This middle region 114 maycontain two subsequences 114(A), 114(B) on either side of the pointwhere the second target site 116 will be cut by a second enzyme. Thelength of the two subsequences 114(A), 114(B) in the middle portion 114of the HDR template 108 may be different than the lengths of the twosubsequences 102(A), 102(B) but may follow the same size range and bebetween five and 20 nt in length. Thus, the total length of the HDRtemplate 108 may be between about 20 and 80 nt. Because the middleregion 114 encodes a second target site 116, the HDR template 108 itselfprovides the basis for this process to be repeated iteratively. So longas a signal is detected by a cell and the components for creating a DSBand performing HDR are available, this process may continue until thesignal ceases. Thus, a length of the inserted DNA may correlate with aduration of the signal.

The HDR template 108 then repairs the DSB through HDR. The efficiency ofHDR may be low, and in some conditions, other repair mechanisms canpredominate. The efficiency of HDR is determined in part by theconcentration of donor DNA present at the time of repair, the length ofthe homology arms of the donor DNA, the cell cycle, and the activity ofthe endogenous repair systems. An overabundance of the HDR template 108may be provided to increase efficiency of HDR. The overabundance of theHDR template 108 may be provided to a cell-free system by addingadditional copies of the ssRNA or ssDNA manually or with the use ofmicrofluidics. The HDR template 108 may also be provided, inoverabundance if desired, by placing a gene encoding the HDR template108 under control of a strong promoter and/or by having multiple copiesof the gene encoding the HDR template 108 all undergoing transcription.In an implementation, this promoter may be regulated by a signalingpathway that responds to a signal. When the signal is detected, thepromoter is turned on and more copies of the HDR template 108 aregenerated.

The 5′-ended DNA strand is resected at the DSB to create a 3′ overhang.This will serve as both a substrate for proteins required for strandinvasion and a primer for DNA repair synthesis. The HDR template 108 canthen displace one strand of the homologous DNA duplex and pair with theother; this causes formation of hybrid DNA referred to as thedisplacement loop (“D loop”) 118. The recombination intermediates canthen be resolved to complete the DNA repair process. As mentioned above,an overabundance of the HDR template 108 may be provided. One ofordinary skill in the art will understand how to perform HDR with dsDNA100 having a DSB and an HDR template 108. Possible protocols forperforming HDR are provided in Jie Liu et al., In Vitro Assays for DNAPairing in Recombination Associated DNA Synthesis, 745 Methods Mol. Bio.363 (2011); Gratz, S. et al., Highly specific and efficientCRISPR/Cas9-catalyzed homology-directed repair in Drosophila, 196Genetics 967 (2014); Richardson, C. C. et al., Enhancinghomology-directed genome editing by catalytically active and inactiveCRISPR-Cas9 using asymmetric donor DNA, 34 Nature Biotechnology 399(2016); and Lin, S. et al., Enhanced homology-directed human genomeengineering by controlled timing of CRISPR/Cas9 delivery, eLIFE (2014).

After the HDR template 108 invades the dsDNA, the D loop 118 is formedby hybridization of the 3′-end sequence 110 to the first subsequence102(A) of the target site 102 and hybridization of the 5′-end sequence112 to the second subsequence 102(B) of the target site 102. DNApolymerase synthesizes new ssDNA 120 complementary to the middle portion114 of one strand of the dsDNA 100. DNA ligase joins the sugar-phosphatebackbone of the newly synthesized ssDNA 120 with the remainder of thatstrand of the dsDNA 100. This forms one strand of the second target site116.

Hybridization requires that the two polynucleotides containcomplementary sequences, although mismatches between bases are possible.The conditions appropriate for hybridization between two polynucleotidesdepend on the length of the polynucleotides and the degree ofcomplementation which are variables well known in the art. The greaterthe degree of complementation between two nucleotide sequences, thegreater the value of the melting temperature (T_(m)) for hybrids ofpolynucleotides having those sequences. For hybridizations betweenpolynucleotides with short stretches of complementarity (e.g.complementarity over 35 nt or less, 30 nt or less, 25 nt or less, 22 ntor less, 20 nt or less, or 18 nt or less) the position of mismatchesbecomes important. This is understood by one of ordinary skill in theart and described in Sambrook, J. and Russell, W., Molecular Cloning: ALaboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press,Cold Spring Harbor (2001) at sec. 11.7-11.8. Typically, the length for ahybridizable polynucleotide is at least about 10 nt. Illustrativeminimum lengths for a hybridizable polynucleotide are: at least about 15nt; at least about 20 nt; at least about 22 nt; at least about 25 nt;and at least about 30 nt). Furthermore, the skilled artisan willrecognize that the temperature, pH, and wash solution salt concentrationmay be adjusted as necessary according to factors such as length of theregion of complementation and the degree of complementation.

Following repair of the first strand of the dsDNA 100, the second strandof the dsDNA 100 is repaired by DNA polymerase and DNA ligase using thesequence of the new ssDNA 120 in the repaired, first strand as atemplate. This completes the repair of the dsDNA 100 resulting in dsDNAthat includes the second target site 116 inserted within the firsttarget site 102.

DNA polymerases are enzymes that synthesize DNA molecules fromindividual deoxyribonucleotides. During this process, DNA polymerase“reads” an existing DNA strand to create a new, complementary strand.DNA ligase is a specific type of enzyme, a ligase, that facilitates thejoining of DNA strands together by catalyzing the formation of aphosphodiester bond. It plays a role in repairing single-strand breaks.The mechanism of DNA ligase is to form two covalent phosphodiester bondsbetween 3′ hydroxyl ends of one nucleotide, (“acceptor”) with the 5′phosphate end of another (“donor”). The DNA ligase from bacteriophage T4is the ligase most-commonly used in laboratory research. It can ligatecohesive or “sticky” ends of DNA, oligonucleotides, as well as RNA andRNA-DNA hybrids, but not single-stranded polynucleotides. It can alsoligate blunt-ended DNA.

Note that the HDR template 108 includes two types of regions: endregions and a middle region. The end regions are homologous to one ofthe strands of the dsDNA 100 on either side of the DSB. Here, thehomologous regions are shown by the 3-end sequence 110 and the 5′-endsequence 112. The homology need not be 100% but only to the extent thatthe 3′-end sequence 110 and the 5′-end sequence 112 hybridize to onestrand of the dsDNA 100. The middle region is the middle portion 114 ofthe HDR template 108 that encodes the sequence of the second target site116. Independently varying both the end regions and the middle regionallows for creation of multiple different HDR templates 108 from arelatively limited set of end regions and middle regions. Thus, themiddle region of an inserted HDR template 108 need not have the sametarget site 102 or cut site 106 as the dsDNA 100 it is being insertedinto.

Following HDR, the dsDNA 100 includes the first subsequence 102(A) ofthe first target site 102 followed by the first subsequence 116(A) ofthe second target site 116. The DNA sequence 122 represented by thisorder of the two subsequences 102(A), 116(A) of the two target sites mayrepresent a particular signal combination (e.g., temperature above 30°C. followed by pH under 5). As mentioned above, a length of thesubsequence 102(A) is from five to 20 nt and the length of thesubsequence 114(A) is also from five to 20 nt. Thus, in animplementation, the total length of the DNA sequence 122 is from 10 to40 nt.

HDR, however, is not the only way to repair a DSB. Non-HomologousEnd-Joining (NHEJ) is a pathway that repairs double-strand breaks in DNAand may be favored over HDR in many conditions. NHEJ is referred to as“non-homologous” because the break ends are directly ligated without theneed for a homologous template. NHEJ is active throughout the cell cycleand has a higher capacity for repair, as there is no requirement for arepair template (sister chromatid or homologue) or extensive DNAsynthesis. NHEJ also finishes repair of most types of breaks in tens ofminutes—an order of magnitude faster than HDR. Thus, in many cells thereis competition between HDR and NHEJ. If the ratio of HDR to NHEJ is highenough, HDR will continue. However, in the presence of NHEJ some of theDSBs formed by the enzyme 104 will rejoin without an insert.

NHEJ is consequently the principle means by which DSBs are repaired innatural cells. NHEJ-mediated repair is prone to generating indel errors.Indel errors generated in the course of repair by NHEJ are typicallysmall (1-10 nt) but extremely heterogeneous. There is consequently abouta two-thirds chance of causing a frameshift mutation. Thus, it may bedesirable to minimize NHEJ and increase the probability that a DSB willbe repaired by HDR. The likelihood of HDR being used may be improved byinhibiting components of the NHEJ process. Addition of small moleculessuch as NU7441 and KU-0060648 is one technique for inhibiting NHEJthrough inhibition of DNA-dependent protein kinase, catalytic subunit(“DNA-PKcs”). Techniques for enhancing HDR efficiency in this way aredescribed in Maruyama, et al., Increasing the efficiency of precisegenome editing with CRISPR-Cas9 by inhibition of nonhomologous endjoining. 33(5) Nature Biotechnology, 538 (2015) and Robert, et. al.,Pharmacological inhibition of DNA-PK stimulates Cas9-mediated genomeediting. 7 Genome Medicine 93 (2015). In an implementation, HDRefficiency may be improved by suppressing the molecules KU70, KU80,and/or DNA ligase IV, which are involved in the NHEJ pathway. Inaddition to the suppression, the Cas9 system, E1B55K, and/or E4orf6 maybe expressed to further increase HDR efficiency and reduce NHEJactivity. Techniques for enhancing HDR efficiency in this way aredescribed in Chu et al., Increasing the efficiency of homology-directedrepair for CRISPR-Cas9-induced precise gene editing in mammalian cells.33(5) Nature Biotechnology, 543 (2015). Further, use of asingle-stranded DNA oligo donor (ssODN) has been shown to improve therate of HDR and knockin efficiency by up to 60% in Richardson et al.,Enhancing homology-directed genome editing by catalytically active andinactive CRISPR-Cas9 using asymmetric donor DNA, 34(3) NatureBiotechnology 339 (2016).

FIG. 2 shows schematic illustrations of further manipulations performedon the dsDNA 100 molecule of FIG. 1. A second enzyme 200 creates asecond DSB at a second cut site 202 in the second target site 116. Thesecond target site 116 has a different sequence than the first targetsite 102, and thus, the second enzyme 200 recognizes a different DNAsequence than the first enzyme 104. Creating a DSB in the second targetsite 116 at the cut site 202 creates the first subsequence 116(A) of thesecond target site 116 on one side of the cut site 202 and a secondsubsequence 116(B) of the second target site 116 on the other side ofthe cut site 202. In some implementations, the first subsequence 116(A)and the second subsequence 116(B) may have the same sequence. Thus, thefirst subsequence 116(A) and a second subsequence 116(B) may have thesame nucleotide length. Also, if the first subsequence 116(A) and thesecond subsequence 116(B) are the same sequence, the second target site116 may be thought of as having a single subsequence repeated once witha cut site 202 in the middle.

A second HDR template 204 contacts the dsDNA 100 to provide a templatefor HDR of the DSB. The second HDR template 204 includes a 3′-end region206 that is homologous to one strand of the dsDNA 100 within the firstsubsequence 116(A) of the second target site 116. The second HDRtemplate 204 also includes a 5′-end sequence 208 that is homologous toone strand of the dsDNA 100 within the second subsequence 116(B) of thesecond target site 116. The second HDR template 204 also includes aportion in the middle region 210 that encodes a third target site for athird enzyme. The middle region 210 includes a first subsequence 210(A)on one side of a third cut site 212 and a second subsequence 210(B) onother side of the third cut site 212.

Annealing of the second HDR template 204 to one strand of the dsDNA 100creates a D loop 214 by hybridization of the 3′-end sequence 206 to thesubsequence 116(A) and hybridization of the 5′-end sequence 208 to thesubsequence 116(B). DNA polymerase and DNA ligase repair the strand ofthe dsDNA 100 to which the second HDR template 204 is hybridized bycreating new DNA 216. The second strand of the dsDNA 100 is thenrepaired using the first strand as a template.

The dsDNA 100 now includes the third target site 218 inserted into themiddle of the second target site 116 (which is itself inserted in themiddle of the first target site 102). The order of the subsequence116(A) followed by the subsequence 218(A) form a DNA sequence 220 thatmay create a record of a second combination of detected signals. Thus,the growing string of inserted DNA sequences can provide an ordered logof molecular events experienced by a cell. This process can repeat torecord any number of molecular events.

Addition of HDR templates into existing DNA using the mechanismsdescribed above may be regulated by signaling pathways as described indetail below. The encoding scheme described herein allows for insertionof DNA sequences representing an unbounded length. AN HDR template thatdoes not include a cut site may be added once, end the process of HDR,and create a record that a specified signal was detected. The dsDNA in acell may have multiple different target sites at different locationsthat include different cut sites and are homologous to different HDRtemplates. This provides for orthogonal recording of signals without anylinkage between the signals. For example, a first target site may beconfigured to integrate a first HDR template if the cell is exposed toradiation, a second target site may be configured to integrate a secondHDR template if the cell is exposed to hydrocarbons, and a third targetsite may be configured to integrate a third HDR template if the cell isexposed to light. Each cell configured in this way will createindependent logs of the signals (e.g., radiation, hydrocarbons, andlight) that it was exposed to. A cell may be modified to have any numberof orthogonal target sites.

The three target sites may be represented as X₁X₂, Y₁Y₂, and Z₁Z₂. Thefirst portion of the target site (e.g., X₁, Y₁, or Z₁) corresponds tosubsequence 102(A) or subsequence 116(A) shown in FIG. 1. The remainingportion of the target site (e.g., X₂, Y₂, or Z₂) corresponds tosubsequence 102(B) or subsequence 116(B) shown in FIG. 1. Thus, each X,Y, and Z represents a DNA sequence of about 5 to 20 nt such as, forexample only, ACTGAA, GCCTCAT, TGACG, etc. In some implementationsX₁=X₂, etc., but in other implementations the first portion of a targetsite may be different in sequence and/or length from the remainingportion of the target site.

The HDR templates all have end regions that are homologous to one of thetarget sites. Thus, the HDR templates will have sequences of thestructure: X₁aX₂, Y₁bY₂, and Z₁cZ₂ where “a,” “b,” and “c” represent DNAsequences of the middle regions. Recall that the middle region of theHDR templates may itself encode a target site. Thus, for example, a mayrepresent X₁X₂, b may represent Z₁Z₂, and c may represent a differenttarget site W₁W₂. If the middle region does encode a target site,integration of an HDR template into dsDNA may be followed by furtherintegration of the same or a different HDR template. Insertion of an HDRtemplate into dsDNA that has been itself created by integration of anHDR template is referred to in this disclosure as “iterativeintegration.”

Thus, a design using iterative integration of a single HDR template mayrecord the presence of a signal and the length of the signal. Forexample, the HDR template may be XaXXaX and the initial insertion sitemay be XX. Iterative integration will result in a sequence that isrepresented by:

-   -   XXaXaXaXaX . . . XaXaXaXaXX        This sequence can keep growing continuously while the signal is        detected. A potential problem is that the HDR templates may be        cut by the same enzyme that creates a DSB at the insertion site        because both include the sequence XX which is recognized by the        enzyme used for this logging. Physical separation, splicing,        self-excising elements, homologous bridges, or methylation may        be used to prevent or decrease the amount of HDR templates that        are cut before integration into the dsDNA.

In one configuration, the continued detection of multiple signals may berecorded by appropriately designed HDR templates and insertion sites. ANHDR template with a sequence XaYYaX is expressed when a first signal “a”is detected. Similarly, an HDR template YbXXbY is expressed when asecond signal “b” is detected. Initially, the cell may include a targetsite XX or YY. If the cell only includes the target site XX, presence ofsignal “b” will not be recorded until the HDR template associated withsignal “a” is first integrated into the DNA of the cell. As each HDRtemplate provides the target site for the other, alternating exposure tosignals “a” and “b” or continued exposure to both signals leads tocontinued integration of the HDR templates. This alternating, iterativeaddition will result in a sequence represented by:

-   -   XaYbXaYbX . . . XbYaXbYaX        This provides sequential recording of signals “a” and “b”        independent of the relative concentrations of the HDR templates        XaYYaX and YbXXbY. This technique for logging multiple signals        at the same location in DNA may be expanded to cover three,        four, or even more different signals.

In one configuration, multiple signals may be associated with HDRtemplates that have the same target sites. For example, a first signal“a” and a second signal “b” may be associated respectively with the HDRtemplates XaXXaX and XbXXbX. Either HDR template may be integrated intothe target site XX. Once integrated, both HDR templates also include thetarget site XX allowing for iterative addition of either or both. Inmost conditions, the level of relative incorporation of the two HDRtemplates will be proportional to the relative concentrations of HDRtemplates. The amount of each HDR template present in the cell may bedesigned to be proportional to the strength, frequency, and/or durationof the corresponding signal. For example, if signal “a” is strong andconstant the cell may produce a relatively large amount of the XaXXaXtemplate. When signal “b” is present, the amount of the XbXXbX templatemay increase and then that HDR template is also integrated into the DNAof the cell. So long as all components are present, iterative insertionof these two templates depends on relative strengths of signals “a” and“b” and will result in a sequence represented by:

-   -   X[a|b]X[a|b]X . . . X[a|b]X[a|b]X        where [a|b] is a orb. The relative amount of “a” vs. “b” in the        DNA provides a record of which signal was strongest and changes        from a period of “a” dominance to a period of “b” dominance        indicates a temporal change in the relative signal strengths.        This configuration may be expanded to include three, four, or        more different signals and HDR templates. Analysis of the DNA        sequence created by this iterative and competitive integration        of multiple HDR templates may be performed over defined lengths        of nucleotides which represent periods of time. The lengths of        nucleotides may be analyzed by considering a series of sliding        windows (e.g., a 10,000 nt stretch of the DNA) and determining        the relative level of Xa vs. Xb in a given window. This provides        information about the relative strength of signals “a” and “b”        during a given period of time.

One way of using this configuration is in a cell that has constitutiveexpression (rather than in response to a signal) of the first HDRtemplate XaXXaX. This template will be expressed and present in the cellat a constant level. It may be thought of as a background signal. Thelevel of the second HDR template XbXXbX will vary depending on thestrength of signal “b.” Thus, the amount of the XbXXbX templateintegrated into the DNA indicates the relative strength of signal “b” ascompared to the baseline established by expression of XaXXaX.

Another way of using the configuration described above is to use thepresence of one of the HDR templates in the DNA of the cell as atemporal indication like a time stamp. For example, the concentration ofthe first HDR template may respond to the detection of a signal. If thesignal is continually present, then the HDR template XaXXaX will beiteratively introduced into the DNA of the cell. As described above, thelength of the insertion will depend on the duration that the signal “a”is present. Intentionally exposing the cell to signal “b” at known timepoints provides references point in the DNA that can be correlated tothe known times of exposure to signal “b.” When exposed to signal “b,”the expression of the second HDR template XbXXbX increases to a levelgreater than the expression of XaXXaX (e.g., the second HDR template maybe regulated by a stronger promoter or present in more copies than thefirst HDR template). Thus, each point in the DNA that has an insertionof XbXbXb . . . indicates a time when the cell was exposed to “b” Forexample, if the cell is exposed to signal “b” every 24 hours, eachstring of DNA between XbXbXb . . . sequences represents the activity ofsignal “a” during that 24-hour period.

The above configurations may be combined to record multiple signalssequentially regardless of relative strength and also to record thestrongest signal based on competing HDR templates. There may be multipleclasses of HDR templates with each class having multiple different HDRtemplates transcribed in response to different signals. For example,there may be two classes of HDR templates XaYYaX and YbXXbY. Becausethese two HDR templates integrate into the target site created byaddition of the other (i.e., the template that integrates into XX addsthe target site YY and the template that integrates into YY adds thetarget site XX) they will alternate. Thus, the DNA will incorporatefirst an HDR template from the “a” class then an HDR template from the“b” class. Each class of HDR template includes two (but may include anynumber) HDR templates with partially different sequences that correspondto different signals. Thus, a signal “a₁” may cause increased expressionof the HDR template Xa₁YYa₁X and a signal “a₂” may cause increasedexpression of the HDR template Xa₂YYa₂X. Similarly, a signal “b₁” maycause increased expression of the HDR template Yb₁XXb₁Y and a signal“b₂” may cause increased expression of the HDR template Yb₂XXb₂Y. If thecell begins with DNA that includes the insertion site XX, then first oneof the “a” HDR templates will be integrated based on the relativeconcentrations of the Xa₁YYa₁X and of the Xa₂YYa₂X HDR templates. Doingso creates a YY insertion site and is followed by integrating one of the“b” HDR templates again based on relative concentrations.

In one implementation, each class of the HDR template may record valuesassociated with a particular type of molecular event. For example, the“a” class of HDR templates may indicate temperature experienced by thecell with Xa₁YYa₁X expressed if the temperature is below 32° C. andXa₂YYa₂X expressed if the temperature is above 42° C. Thus, integrationof the “a” class of HDR templates creates a record of relativetemperature. The “b” class of HDR templates may be associated with adifferent type of signal such as salinity. The HDR template Yb₁XXb₁Y maybe expressed when the cell is in an environment with salinity below0.600 M and Yb₂XXb₂Y may be expressed when the cell is in an environmentwith salinity above 0.700 M. Thus, the record created in the DNA of thiscell shows temperature high/low and salinity high/low. Each is recordedin turn so there is a log created over time showing changes in twodifferent signals. Of course, any number of different gradations orlevels of variables may be tracked by having distinct HDR templatesunder the control of appropriate promoter.

In one example implementation, using Cas9 as the nuclease with a PAMsequence of NNNNGATTT as the enzyme, three target sites may be:

X₁ = TAGCCGTATCGAGCATCGATG | CGCNNNNGATT = X₂Y₁ = GATCGATGGACTCTGCATCTA | TCGNNNNGATT = Y₂Z₁ = CGGGACGATCGATCGGGCTAG | ACTNNNNGATT = Z₂Where the PAM sequence is indicated by bold, X₁ is (SEQ ID NO: 1), X₂ is(SEQ ID NO: 2), Y₁ is (SEQ ID NO: 3), Y₂ is (SEQ ID NO: 4), Z₁ is (SEQID NO: 5), and Z₂ is (SEQ ID NO: 6). Note that each of X₁, Y₁, and Z₁are 21 nt long.

Each of the target sites is recognized by a corresponding guide ssDNAthat cuts the dsDNA at the location indicated by the “{circumflex over( )}” below. They should have a trans-activating crRNA (tracrRNA) thatis a small trans-encoded RNA for attaching to Cas9 appended to the end.The crRNAs are incorporated into effector complexes, where the crRNAguides the complex to the target site and the Cas proteins create a DSBin the polynucleotide. The respective ssDNA sequences are:

(SEQ ID NO: 1) gX₁ = TAGCCGTATCGAGCATCGATG {circumflex over ( )} CGC(SEQ ID NO: 3) gY₁ = GATCGATGGACTCTGCATCTA {circumflex over ( )} TCG(SEQ ID NO: 5) gZ₁ = CGGGACGATCGATCGGGCTAG {circumflex over ( )} ACTThen a homology directed repair sequence of X₁Y₁Y₂X₂ is:TAGCCGTATCGAGCATCGATG|GATCGATGGACTCTGCATCTA|TCGNNNNGATT|CGCNNNNGATT (SEQID NO: 7) and a homology directed repair sequence of Y₁X₁X₂Y₂ is:GATCGATGGACTCTGCATCTA|TAGCCGTATCGAGCATCGATG|CGCNNNNGATT|TCGNNNNGATT (SEQID NO: 8). Other homology directed repair sequences can be designedaccording to the same pattern.

An initial cut of the target site X₁X₂ will create a DSB that appears as(only one strand of the dsDNA is shown):

(SEQ ID NO: 1) . . . TAGCCGTATCGAGCATCGATG (SEQ ID NO: 2CGCNNNNGATT . . .After HDR with X₁Y₁Y₂X₂, one strand of the dsDNA will have the followingsequence that now includes the target site Y₁Y₂ indicated by italics:

(SEQ ID NO: 7) TAGCCGTATCGAGCATCGATG | GATCGATGGACTCTGCATCTA ∥TCGNNNNGATT | CGCNNNNGATT.The dsDNA is now able to be cut by a Cas9 that has Y₁ creating a DSB atthe location represented by “∥”. HDR may be performed with Y₁X₁X₂Y₂, forexample, further adding to the dsDNA and completing another iteration ofencoding. This may be continued with various sequences of cuts and HDRtemplates to record any series of molecular events.

Signaling Pathways

FIG. 3 shows a diagram 300 of an illustrative signaling pathway thatregulates expression of a gene. The signaling pathway may be anengineered signaling pathway that is created or modified in some way tobe different from a wild-type signaling pathway. The signaling pathwaycontrols the expression of a gene 302 that is under the control of apromoter 304 and may also be under the control of an operator 306. Apromoter is a region of DNA that initiates transcription of a particulargene. Promoters are located near the transcription start sites of genes,on the same strand and upstream on the DNA (towards the 5′ region of thesense strand). Illustrative promoters are described below. The sequenceof the promoter region controls the binding of the RNA polymerase andtranscription factors. An operator is a segment of DNA to which arepressor binds to decrease or stop gene expression. A “transcriptionfactor” is a protein that binds near the beginning of the codingsequence (transcription start site) for a gene or functional mRNA.Transcription factors are necessary for recruiting DNA polymerase totranscribe DNA. A transcription factor can function as a repressor,which can bind to the operator to prevent transcription. The gene 302,the promoter 304, and the operator 306 are on a dsDNA molecule that maybe genomic DNA of a cell or other DNA such as a plasmid or vector. Insome implementations, the promoter 304 may respond to signals such astemperature or pH and thus the promotor 304 itself may be the signalingpathway.

The repressor (and/or “knockdown”) may be a protein or mRNA (smallhairpin loops (shRNA), interfering mRNA (RNAi or siRNA)) that binds toDNA/RNA and blocks either attachment of the promoter, blocks elongationof the polymerase during transcription, or blocks mRNA from translation.In addition to repressors, the CRISPR/Cas9 system itself may be used forsequence-specific repression of gene expression in prokaryotic andeukaryotic cells. Specifically, the technique of CRISPR interference(CRISPRi) uses catalytically dead Cas9 lacking endonuclease activity toregulate genes in an RNA-guided manner. Catalytically inactive Cas9 maybe created by introducing point mutations into the Cas9 protein such asat the two catalytic residues (D10A and H840A) of the gene encodingCas9. In doing so, dCas9 is unable to cleave dsDNA but retains theability to target DNA. Targeting specificity for CRISPRi is determinedby complementary base pairing of a guide RNA (gRNA) to the genomic loci.The gRNA may be designed to target a specific promoter. The complexcatalytically dead Cas9 and the gRNA will block activation of thepromoter and turn off expression of any gene regulated by that promoter.

The signaling pathway may include a signaling cascade 308 that carries asignal from a first messenger (i.e., the initial signal) and eventuallyresults in activation, or alternatively suppression, of either thepromoter 304 or the operator 306. The initial signal that sets thesignaling cascade 308 into action may be an internal or external signal.The signaling pathway may be a trans-membrane signaling pathway thatincludes an external receptor 310 which detects extracellular signalsand communicates the signal across a membrane 312. The membrane 312 maybe a cell wall, lipid bilayer, artificial cell wall, or syntheticmembrane.

In one implementation, the external receptor 310 may be a Gprotein-coupled receptor (GPCR). GPCRs constitute a large protein familyof receptors, that sense molecules outside the membrane 312 and activatethe signaling cascade 308 and, ultimately, cellular responses. The GPCRis activated by an external signal in the form of a ligand or othersignal mediator. This creates a conformational change in the GPCR,causing activation of a G protein. Further effect depends on the type ofG protein. G proteins are subsequently inactivated by GTPase activatingproteins, known as RGS proteins. The ligands that bind and activatethese GPCRs include light-sensitive compounds, odors, pheromones,hormones, neurotransmitters, etc. and vary in size from small moleculesto peptides to large proteins. When a ligand binds to the GPCR it causesa conformational change in the GPCR, which allows it to act as a guaninenucleotide exchange factor (GEF). The GPCR can then activate anassociated G protein by exchanging its bound GDP for a GTP. The Gprotein's a subunit, together with the bound GTP, can then dissociatefrom the β and γ subunits to further affect intracellular signalingproteins or target functional proteins directly depending on the asubunit type.

In one implementation, the external receptor 310 may be a photosensitivemembrane protein. Photoreceptor proteins are light-sensitive proteinsinvolved in the sensing and response to light in a variety of organisms.Photoreceptor proteins typically consist of a protein moiety and anon-protein photopigment that reacts to light via photoisomerization orphotoreduction, thus initiating a change of the receptor protein thattriggers the signaling cascade 308. Pigments found in photoreceptorsinclude retinal (retinylidene proteins, for example rhodopsin inanimals), flavin (flavoproteins, for example cryptochrome in plants andanimals) and bilin (biliproteins, for example phytochrome in plants).One example of engineered use of light-sensitive proteins is found inTamsir, A. et al., Robust Multicellular Computing Using GeneticallyEncoded NOR Gates and Chemical ‘Wires’, 469 Nature 214 (2011).

The external receptor 310, in some implementations, may also be amembrane-bound immunoglobulin (mlg). A membrane-bound immunoglobulin isthe membrane-bound form of an antibody. Membrane-bound immunoglobulinsare composed of surface-bound IgD or IgM antibodies and associated Ig-αand Ig-β heterodimers, which are capable of signal transduction throughthe signaling cascade 308 in response to activation by an antigen.

In one implementation, the external receptor 310 may be a Notch protein.The Notch protein spans the cell membrane, with part of it inside andpart outside. Ligand proteins binding to the extracellular domain induceproteolytic cleavage and release of the intracellular domain, whichenters the cell to modify gene expression. The receptor may be triggeredvia direct cell-to-cell contact, in which the transmembrane proteins ofthe cells in direct contact form the ligands that bind the notchreceptor. Signals generated by the Notch protein may be carried to anoperon by the Notch cascade which consists of Notch and Notch ligands aswell as intracellular proteins transmitting the notch signal.

In one implementation, temperature may activate the signaling pathway.Thus, by altering the temperature, expression of the gene 302 may be upor down regulated. Temperature sensing molecules that occur naturally insingle celled organisms include heat shock proteins and certain RNAregulatory molecules, such as riboswitches. Heat shock proteins areproteins that are involved in the cellular response to stress. Oneexample of a heat shock protein that responds to temperature is thebacterial protein DnaK. Temperatures elevated above normal physiologicalrange can cause DnaK expression to become up-regulated. DnaK and otherheat shock proteins can be utilized for engineered pathways that respondto temperature. Riboswitches are a type of RNA molecule that can respondto temperature in order to regulate protein translation. An example of atemperature-regulated engineered pathway that has utilized a riboswitchcan be found in Neupert, J. et al., Design of simple synthetic RNAthermometers for temperature-controlled gene expression in Escherichiacoli., 36(19) Nucleic Acids Res., e124, (2008). Another example of atemperature-sensitive molecule that can be utilized to regulateengineered cell pathways is a temperature-sensitive mutant protein.Single mutations can be made to proteins, which cause the proteins tobecome unstable at high temperatures, yet remain functional at lowertemperatures. Methods for synthesizing temperature-sensitive mutantproteins can be found in Ben-Aroya, S. et al., MakingTemperature-Sensitive Mutants, 470 Methods Enzymology 181 (2010). Anexample of a temperature-controlled engineered pathway that utilizes atemperature-sensitive mutant can be found in Hussain, F. et al.,Engineered temperature compensation in a synthetic genetic clock, 111(3)PNAS 972 (2014).

In one implementation, ion concentration or pH may activate thesignaling pathway. With signaling pathways of this type, placing a cellin a different ionic environment or altering pH surrounding the cell maybe used to control the availability of a given HDR template or enzyme.Examples of cellular sensing molecular mechanisms that detect ionicstrength or pH include many viral proteins, such as herpes simplex virusgB, rubella virus envelope protein, influenza hemagglutinin, andvesicular stomatitis virus glycoprotein. An example of a naturalcellular pathway that is regulated by pH is penicillin production byAspergillus nidulans as described in Espeso, E. et al., pH Regulation isa Major Determinant in Expression of a Fungal Penicillin BiosyntheticGene, 12(10) EMBO J. 3947 (1993). Another example of a pH-sensitivemolecule that can be utilized to regulate engineered cell pathways is apH-sensitive mutant protein. Single mutations can be made to proteins,which can cause the proteins to become less stable in either acidic orbasic conditions. For example, pH-sensitive antibodies can bind to anantigen at an optimal pH, but are unable to bind to an antigen at anon-optimal pH. A technique for creating pH-sensitive antibodies thatcan be used for engineered signaling pathways can be found in Schroter,C. et al., A generic approach to engineer antibody pH-switches usingcombinatorial histidine scanning libraries and yeast display, 7(1) MAbs138 (2015). These and other similar sensing mechanisms may be engineeredto affect the behavior of a promoter 304 or operator 306.

The gene 302 encodes for gene product 314 that may ultimately be thebasis for a number of components in an HDR system. For example, the geneproduct 314 may be translated into protein, used directly as RNA, orreverse transcribed into DNA. In one implementation, the gene product314 may be translated into a nuclease 316 that creates DSBs such as, forexample, enzyme 104 shown in FIG. 1, or enzyme 200 shown in FIG. 2. Thenuclease 316 may be a Cas enzyme such as Cas9, Cas1, or Cas2.

For example, the S. pyogenes Cas9 system from the ClusteredRegularly-Interspaced Short Palindromic Repeats-associated (CRISPR-Cas)family is an effective genome engineering enzyme that catalyzesdouble-stranded breaks and generates mutations at DNA loci targeted by agRNA. The native gRNA is comprised of a 20 nt Specificity DeterminingSequence (SDS), which specifies the DNA sequence to be targeted, and isimmediately followed by a 80 nt scaffold sequence, which associates thegRNA with Cas9. In addition to sequence homology with the SDS, targetedDNA sequences possess a Protospacer Adjacent Motif (PAM) (5′-NGG-3′)immediately adjacent to their 3′-end in order to be bound by theCas9-sgRNA complex and cleaved. When a double-stranded break isintroduced in the target DNA locus in the genome, the break is repairedby either homologous recombination (when a repair template is provided)or error-prone non-homologous end joining (NHEJ) DNA repair mechanisms,resulting in mutagenesis of targeted locus. Even though the normal DNAlocus encoding the gRNA sequence is perfectly homologous to the gRNA, itis not targeted by the standard Cas9-gRNA complex because it does notcontain a PAM.

In a wild-type CRISPR/Cas system, gRNA is encoded genomically orepisomally (e.g., on a plasmid). Following transcription, the gRNA formsa complex with Cas9 endonuclease. This complex is then “guided” by thespecificity determining sequence (SDS) of the gRNA to a DNA targetsequence, typically located in the genome of a cell. For Cas9 tosuccessfully bind to the DNA target sequence, a region of the targetsequence must be complementary to the SDS of the gRNA sequence and mustbe immediately followed by the correct protospacer adjacent motif (PAM)sequence (e.g. “NGG”). Thus, in a wild-type CRISPR/Cas9 system, the PAMsequence is present in the DNA target sequence but not in the gRNAsequence (or in the sequence encoding the gRNA).

The PAM sequence is typically a sequence of nucleotides located adjacentto (e.g., within 10, 9, 8, 7, 6, 5, 4, 3, 3, or 1 nt of) an SDSsequence). A PAM sequence is “immediately adjacent to” an SDS sequenceif the PAM sequence is contiguous with the SDS sequence (that is, ifthere are no nucleotides located between the PAM sequence and the SDSsequence). In some implementations, a PAM sequence is a wild-type PAMsequence. Examples of PAM sequences include, without limitation, NGG,NGR, NNGRR (T/N), NNNNGATT, NNAGAAW, NGGAG, and NAAAAC, AWG, CC. In someimplementations, a PAM sequence is obtained from Streptococcus pyogenes(e.g., NGG or NGR). In some implementations, a PAM sequence is obtainedfrom Staphylococcus aureus (e.g., NNGRR (T/N)). In some implementations,a PAM sequence is obtained from Neisseria meningitidis (e.g., NNNNGATT).In some implementations, a PAM sequence is obtained from Streptococcusthermophilus (e.g., NNAGAAW or NGGAG). In some implementations, a PAMsequence is obtained from Treponema denticola NGGAG (e.g., NAAAAC). Insome implementations, a PAM sequence is obtained from Escherichia coli(e.g., AWG). In some implementations, a PAM sequence is obtained fromPseudomonas auruginosa (e.g., CC). Other PAM sequences are contemplated.A PAM sequence is typically located downstream (i.e., 3′) from the SDS,although in some embodiments a PAM sequence may be located upstream(i.e., 5′) from the SDS.

In one implementation, the gene product 314 encodes for gRNA 318 that isused by the nuclease 316 to target a specific DNA sequence. The systemmay be designed to have all components needed for performing HDR otherthan the gRNA 318. Thus, transcription of the gRNA in response to asignal provides the last component needed to perform HDR and results inincorporation of an HDR template thereby creating a log of the molecularevent. Alternatively, the gRNA 318 may be used not to cut dsDNA but toturn off a promoter through use of CRISPRi guide RNA. CRISPRi guide RNAdirects the nuclease 316 to bind to the promoter 304 and preventtranscription of the gene 302. In this design, the presence of a signalwould stop the insertion of a particular HDR template.

A gRNA is a component of the CRISPR/Cas system. A “gRNA” (guideribonucleic acid) herein refers to a fusion of a CRISPR-targeting RNA(crRNA) and a trans-activation crRNA (tracrRNA), providing bothtargeting specificity and scaffolding/binding ability for Cas9 nuclease.A “crRNA” is a bacterial RNA that confers target specificity andrequires tracrRNA to bind to Cas9. A “tracrRNA” is a bacterial RNA thatlinks the crRNA to the Cas9 nuclease and typically can bind any crRNA.The sequence specificity of a Cas DNA-binding protein is determined bygRNAs, which have nucleotide base-pairing complementarity to target DNAsequences. Thus, Cas proteins are “guided” by gRNAs to target DNAsequences. The nucleotide base-pairing complementarity of gRNAs enables,in some embodiments, simple and flexible programming of Cas binding.Nucleotide base-pair complementarity refers to distinct interactionsbetween adenine and thymine (DNA) or uracil (RNA), and between guanineand cytosine. In some embodiments, a gRNA is referred to as a stgRNA. A“stgRNA” is a gRNA that complexes with Cas9 and guides the stgRNA/Cas9complex to the template DNA from which the stgRNA was transcribed.

The length of a gRNA may vary. In some embodiments, a gRNA has a lengthof 20 nucleotides to 200 nucleotides, or more. For example, a gRNA mayhave a length of 20 to 175, 20 to 150, 20 to 100, 20 to 95, 20 to 90, 20to 85, 20 to 80, 20 to 75, 20 to 70, 20 to 65, 20 to 60, 20 to 55, 20 to50, 20 to 45, 20 to 40, 20 to 35, or 20 to 30 nt.

In one implementation, the gene product 314 may itself be or may encodefor an HDR template 320. The HDR template 320 may be, for example, theHDR template 108 shown FIG. 1 or the HDR template 204 shown in FIG. 2.The gene product 314, although it is a ssRNA, may be capable offunctioning as an HDR template 320 due to the ability of RNA tohybridize with DNA. RNA transcript-mediated HDR has been shown tofunction successfully in eukaryotic cells. See Keskin, H., Shen., Y. etal., Transcript-RNA-templated DNA recombination and repair, 515 Nature436 (2014) and Storici, F. et al., RNA-templated DNA repair, 447 Nature338 (2007). If RNA is used as the HDR template, the cell may be furthermodified to reduce or remove enzymes that degrade RNA-DNA hybrids. Inone implementation, the cell using RNA as the HDR template may be S.cerevisiae. Additionally, complementary DNA (cDNA), resulting fromreverse-transcription of mRNA, and/or transcript RNA itself may aid DSBrepair via HDR. Moreover, splicing of both expressed RNA and potentiallyof mRNA can change the sequence of RNA that serves as a template forreverse transcriptase to synthesize cDNA. Thus, the cDNA used as an HDRtemplate may have a different sequence, due to splicing, than genomic orother DNA encoding the initial RNA transcript. The gene product 314 mayalso be converted to ssDNA by reverse transcriptase and used as the HDRtemplate 320 in the form of DNA.

The gene product 314 may also be translated into some other enzymeproduct 322. The other enzyme product 322 represents another enzyme thatmay be used for logging of molecular events through HDR. Both DNA Taqpolymerase and DNA ligase are examples of other enzyme products used forperforming HDR. In a system that lacks one or both of these enzymes,regulated addition through control of gene expression is a way toregulate the ability to perform HDR. Other enzymes such as transcriptionfactors are another type of other enzyme products 322. Transcriptionfactors expressed from a first gene may be used to activate the promoteror operator of a second gene. There may be greater need for addition ofother enzyme products 322 in a cell-free system or in a minimal cellthan in a biological cell that includes wild-type cellular machinery.

FIG. 4 shows a diagram 400 of two illustrative signaling pathways thatcreate different gene products at levels responsive to strengths of therespective signals. A first signaling pathway 402 responds to a firstsignal 404 by increasing activity of a first promoter 406 which controlstranscription of a first gene 408. The first signaling pathway 402 andthe first signal 404 may be any of the signaling pathways or types ofsignals discussed in this disclosure. The first gene 408 creates a firstgene product 410 that may be any of the types of gene products shown inFIG. 3. For purposes of illustration, the first gene product 410 isshown as encoding a first HDR template 412. Thus, an increase in thefirst signal 404 leads to an increase in the synthesis of the first HDRtemplate 412.

Similarly, a second signaling pathway 414 is responsive to a secondsignal 416 by increasing activity of a second promoter 418 whichcontrols transcription of a second gene 420. The second gene 420 encodesa second gene product 422. The second gene product 422 may be any of thetypes of gene products discussed in FIG. 3. The second gene product 422may be the same or a different type of gene product than the first geneproduct 410. In this diagram 400, the second gene product 422 is shownas a second HDR template 424. The amount of the second HDR template 424is thus regulated by the strength of the second signal 416.

If, for example, the second signal 416 is stronger and/or more frequentthan the first signal 404, the cell will create a greater number ofcopies of the second HDR template 424 than of the first HDR template412. The respective signaling pathways 402, 414 and the promoters 406,418 may be selected to maintain a similar ratio of correspondencebetween respective signal strengths and synthesis of HDR templates 412,424. For example, the respective signaling pathways 402, 414 may be thesame except for the portion of the signaling pathway directly involvedin sensing the primary signal. The promoters 406, 418 may also besimilar and different only in one aspect such as the specifictranscription factor used to activate the promoter.

In this example, the second HDR template 424 is present at aconcentration that is twice as much as the first HDR template 412. Thisindicates that the second signal 416 is approximately twice as strong asthe first signal 404. Because the concentration of the second HDRtemplate 424 is twice that of the first HDR template 412, for each HDRevent it is twice as likely that the second HDR template 424 will beintegrated into a section of dsDNA 426. Thus, over a prolonged period ofiterative integration of HDR templates, it is likely that a sequence 428from the second HDR template 424 will be twice as common as a sequence430 from the first HDR template 412. The dsDNA 426 may include, forexample, a target site 432 into which either the first HDR template 412or the second HDR template 424 may be inserted. The relative amount ofintegration of the sequence 428 from the second HDR template 424 and thesequence 430 of the first HDR template 412 into the dsDNA 426 reflectsthe relative concentrations of the first HDR template 412 and the secondHDR template 424. Specifically, in this example, the sequence 428 of thesecond HDR template 424 is present twice as often as the sequence 430from the first HDR template 412. Thus, the first HDR template 412 andthe second HDR template 424 integrate into the dsDNA 426 in proportionto their respective concentrations.

If the strength of one or more of signals 404, 416 in this examplesystem changes over time then the relative concentrations of thecorresponding HDR templates 412, 424 will also change. This change overtime may be observed by analyzing the sequence of the dsDNA 426 andobserving throughout different portions of that sequence how the ratioof the sequence 428 of the second HDR template 424 to the sequence 430of the first HDR template 412 varies. This temporal analysis may beimplemented, for example, by analyzing a sliding window of nucleotidesof the dsDNA 426 and counting the number of times the sequence 428 fromthe second HDR template 424 is found and the number of times thesequence 430 of the first HDR template 412 is found. The sliding windowmay be any length such as, for example 500 nt, 1000 nt, 5000 nt, etc.

FIG. 5 shows an illustrative cell 500 that is capable of heritabilitystoring a log of events experienced by the cell 500. The cell 500 may bean E. coli cell, a Saccharomyces cerevisiae cell, or a cell from anothersingle-celled organism. It may also be a cell from a multi-cellularorganism grown in culture. Some human cell lines that may be used forcell culture include DU145, H295R, HeLa, KBM-7, LNCaP, MCF-7,MDA-MB-468, PC3, SaOS-2, SH-SY5Y, T47D, THP-1, U87, and National CancerInstitute's 60 cancer cell line panel (NCI60).

The cell 500 may contain a dsDNA molecule 502 that has a first targetsite 504. The cell 500 may also contain a first enzyme 506 that isconfigured to create a DSB at a cut site within the first target site504. For example, the first enzyme 506 may be a CRISPR/Cas systemcomprising a gRNA 508 that includes a spacer region (also called aproto-spacer element or targeting sequence) of about 20 nt that iscomplementary to one strand of the dsDNA 502 at the first target site504.

The dsDNA molecule 502 may also include a promoter 510 and a geneencoding a HDR template 512 such as HDR template 514 shown in thisfigure.

The dsDNA molecule 502 may be a vector or plasmid introduced to the cell500 by any suitable method. A “vector” is a polynucleotide molecule,such as a DNA molecule derived, for example, from a plasmid,bacteriophage, yeast or virus, into which a polynucleotide can beinserted or cloned. One type of vector is a “plasmid,” which refers to acircular double-stranded DNA loop into which additional DNA segments canbe inserted, such as by standard molecular cloning techniques. Anothertype of vector is a viral vector, wherein virally-derived DNA or RNAsequences are present in the vector for packaging into a virus (e.g.retroviruses, replication defective retroviruses, lentiviruses,replicative defective lentiviruses, adenoviruses, replication defectiveadenoviruses, and adeno-associated viruses). Viral vectors also includepolynucleotides carried by a virus for transfection into a host cell.Moreover, certain vectors are capable of directing the expression ofgenes to which they are operatively-linked. Such vectors are referred toherein as “expression vectors.” Common expression vectors of utility inrecombinant DNA techniques are often in the form of plasmids. Plasmidssuitable for expressing embodiments of the present invention, methodsfor inserting nucleic acid sequences into a plasmid, and methods fordelivering recombinant plasmids to cells of interest are known in theart.

A vector may contain one or more unique restriction sites and can becapable of autonomous replication in a defined host cell including atarget cell or tissue or a progenitor cell or tissue thereof (e.g.bacterial vectors having a bacterial origin of replication and episomalmammalian vectors), or be integrable with the genome of the defined hostsuch that the cloned sequence is reproducible (e.g., non-episomalmammalian vectors). Accordingly, the vector can be an autonomouslyreplicating vector, i.e., a vector that exists as an extra-chromosomalentity, the replication of which is independent of chromosomalreplication, e.g., a linear or closed circular plasmid, anextra-chromosomal element, a mini-chromosome, or an artificialchromosome. The vector can contain any means for assuringself-replication. Alternatively, the vector can be one which, whenintroduced into the host cell, is integrated into the genome andreplicated together with the chromosome(s) into which it has beenintegrated. Such a vector may comprise specific sequences that allowrecombination into a particular, desired site of the host chromosome. Avector system can comprise a single vector or plasmid, two or morevectors or plasmids, which together contain the total DNA to beintroduced into the genome of the host cell, or a transposon. The choiceof the vector will typically depend on the compatibility of the vectorwith the host cell into which the vector is to be introduced. The vectorcan include a reporter gene, such as a green fluorescent protein (GFP),which can be either fused in frame to one or more of the encodedpolypeptides, or expressed separately. The vector can also include aselection marker such as an antibiotic resistance gene that can be usedfor selection of suitable transformants.

Several aspects of the invention relate to vector systems comprising oneor more vectors, or vectors as such. Vectors can be designed forexpression of transcripts (e.g. nucleic acid transcripts, proteins, orenzymes) in prokaryotic or eukaryotic cells. For example, transcriptscan be expressed in bacterial cells such as Escherichia coli, insectcells (using baculovirus expression vectors), yeast cells, or mammaliancells. Suitable host cells are discussed further in Goeddel, GeneExpression Technology: Methods In Enzymology, 185, Academic Press. SanDiego, Calif. (1990). Alternatively, the recombinant expression vectorcan be transcribed and translated in vitro, for example using T7promoter regulatory sequences and T7 polymerase.

Vectors may be introduced and propagated in a prokaryote. In someembodiments, a prokaryote is used to amplify copies of a vector to beintroduced into a eukaryotic cell or as an intermediate vector in theproduction of a vector to be introduced into a eukaryotic cell (e.g.amplifying a plasmid as part of a viral vector packaging system).Expression of proteins in prokaryotes is most often carried out in E.coli with vectors containing constitutive or inducible promotersdirecting the expression of proteins. Examples of suitable inducible E.coli expression vectors include pTrc (Amrann et al., (1988) Gene69:301-315) and pET 11d (Studier et al., Gene Expression Technology:Methods In Enzymology 185, Academic Press, San Diego, Calif. (1990)60-89).

In some embodiments, a vector is a yeast expression vector. Examples ofvectors for expression in yeast Saccharomyces cerevisiae includepYepSec1 (Baldari, et al., 1987. EMBO J. 6: 229-234), pMFa (Kuijan andHerskowitz, 1982. Cell 30: 933-943), pJRY88 (Schultz et al., 1987. Gene54: 113-123), pYES2 (Invitrogen Corporation, San Diego, Calif.), andpicZ (InVitrogen Corp, San Diego, Calif.).

In some embodiments, a vector is capable of driving expression of one ormore sequences in mammalian cells using a mammalian expression vector.Examples of mammalian expression vectors include pCDM8 (Seed, 1987.Nature 329: 840) and pMT2PC (Kaufman, et al., 1987. EMBO J. 6: 187-195).For other suitable expression systems for both prokaryotic andeukaryotic cells see, e.g., Chapters 16 and 17 of Sambrook, et al.,Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring HarborLaboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y., 1989.

Appropriate DNA segments may be inserted into a vector by a variety ofprocedures. In general, DNA sequences may be inserted into anappropriate restriction endonuclease site(s) by procedures known in theart, which may be performed without undue experimentation by a skilledartisan. A DNA segment in an expression vector may be operatively linkedto an appropriate expression control sequence(s) (i.e., a promoter suchas 510) to direct synthesis. As used herein, a “promoter” is a DNAregulatory region capable of binding RNA polymerase and initiatingtranscription of a downstream (3′ direction) coding or non-codingsequence. For purposes of defining the present invention, the promotersequence is bounded at its 3′ terminus by the transcription initiationsite and extends upstream (5′ direction) to include the minimum numberof bases or elements necessary to initiate transcription at levelsdetectable above background. Within the promoter sequence will be founda transcription initiation site, as well as protein binding domainsresponsible for the binding of RNA polymerase. Eukaryotic promoters willoften, but not always, contain “TATA” boxes and “CAT” boxes. Variouspromoters, including inducible promoters, may be used to drive thevarious vectors of the present invention. A promoter may also containsub-regions at which regulatory proteins and molecules may bind, such asRNA polymerase and other transcription factors. Promoters may beconstitutive, inducible, activatable, repressible, tissue-specific orany combination thereof.

Promoters may include any promoter known in the art for expressioneither in vivo or in vitro. Promoters which may be used in embodimentsof the present invention may include those that direct constitutiveexpression of a nucleotide sequence in many types of host cell and thosethat direct expression of the nucleotide sequence only in certain hostcells (e.g., tissue-specific regulatory sequences). A tissue-specificpromoter may direct expression primarily in a desired tissue ofinterest, such as muscle, neuron, bone, skin, blood, specific organs(e.g. liver, pancreas), or particular cell types (e.g. lymphocytes). Thepromoters which may be used in embodiments of the present invention mayalso be inducible, such that expression may be decreased or enhanced orturned “on” or “off” For example, promoters which respond to aparticular signal (e.g., small molecule, metabolite, protein, molecularmodification, ion concentration change, electric charge change, actionpotential, radiation, UV, and light) may also be used. Additionally, atetracycline-regulatable system employing any promoter such as, but notlimited to, the U6 promoter or the H1 promoter, may be used. By way ofexample and not of limitation, promoters which respond to a particularstimulus may include, e.g., heat shock protein promoters, and Tet-offand Tet-on promoters.

A promoter can be a constitutively active promoter (i.e., a promoterthat is constitutively in an active/“ON” state), it may be an induciblepromoter (i.e., a promoter whose state, active/“ON” or inactive/“OFF”,is controlled by an external stimulus, e.g., the presence of aparticular temperature, compound, or protein.), it may be a spatiallyrestricted promoter (i.e., transcriptional control element, enhancer,etc.)(e.g., tissue specific promoter, cell type specific promoter,etc.), and it may be a temporally restricted promoter (i.e., thepromoter is in the “ON” state or “OFF” state during specific stages ofembryonic development or during specific stages of a biological process,e.g., hair follicle cycle in mice).

A promoter drives expression or drives transcription of the nucleic acidsequence that it regulates. Herein, a promoter is considered to be“operably linked” when it is in a correct functional location andorientation in relation to a nucleic acid sequence it regulates tocontrol (“drive”) transcriptional initiation and/or expression of thatsequence.

A promoter may be one naturally associated with a gene or sequence, asmay be obtained by isolating the 5′ non-coding sequences locatedupstream of the coding segment of a given gene or sequence. Such apromoter is referred to as an “endogenous promoter.”

In some embodiments, a coding nucleic acid sequence may be positionedunder the control of a recombinant or heterologous promoter, whichrefers to a promoter that is not normally associated with the encodedsequence in its natural environment. Such promoters may includepromoters of other genes; promoters isolated from any other cell; andsynthetic promoters or enhancers that are not “naturally occurring” suchas, for example, those that contain different elements of differenttranscriptional regulatory regions and/or mutations that alterexpression through methods of genetic engineering that are known in theart. In addition to producing nucleic acid sequences of promoters andenhancers synthetically, sequences may be produced using recombinantcloning and/or nucleic acid amplification technology, includingpolymerase chain reaction (PCR). Contemplated herein, in someembodiments, are RNA pol II and RNA pol III promoters. Promoters thatdirect accurate initiation of transcription by an RNA polymerase II arereferred to as RNA pol II promoters. Examples of RNA pol II promotersfor use in accordance with the present disclosure include, withoutlimitation, human cytomegalovirus promoters, human ubiquitin promoters,human histone H2A1 promoters and human inflammatory chemokine CXCL 1promoters. Other RNA pol II promoters are also contemplated herein.Promoters that direct accurate initiation of transcription by an RNApolymerase III are referred to as RNA pol III promoters. Examples of RNApol III promoters for use in accordance with the present disclosureinclude, without limitation, a U6 promoter, a HI promoter and promotersof transfer RNAs, 5S ribosomal RNA (rRNA), and the signal recognitionparticle 7SL RNA.

Illustrative promoters include, but are not limited to the SV40 earlypromoter, mouse mammary tumor virus long terminal repeat (LTR) promoter;adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV)promoter, a cytomegalovirus (CMV) promoter such as the CMV immediateearly promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, ahuman U6 small nuclear promoter (U6) (Miyagishi et al., NatureBiotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g., Xia etal., Nucleic Acids Res. 2003 Sep. 1; 31(17)), a human H1 promoter (H1),and the like.

Examples of inducible promoters include, but are not limited to T7 RNApolymerase promoter, T3 RNA polymerase promoter,Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter,lactose induced promoter, heat shock promoter, Tetracycline-regulatedpromoter, Steroid-regulated promoter, Metal-regulated promoter, estrogenreceptor-regulated promoter, etc. Inducible promoters can therefore beregulated by molecules including, but not limited to, doxycycline; RNApolymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogenreceptor fusion; etc. Cells, such as cells in culture, may betransfected or transformed with the dsDNA molecule 502. Transfection isthe process of deliberately introducing naked or purifiedpolynucleotides into eukaryotic animal cells. Transformation refers toDNA transfer in bacteria and non-animal eukaryotic cells, includingplant cells. Transfection may be performed using viruses or mechanicalmethods. Viral transfection introduces foreign DNA into a cell by avirus or viral vector. Transfection with a virus may introduce the DNAinto the genome of the host cell. Mechanical transfection typicallyinvolves opening transient pores or “holes” in the cell membrane toallow the uptake of material. Transfection can be carried out usingcalcium phosphate (i.e. tricalcium phosphate), by electroporation,microinjection, gene gun, impalefection, hydrostatic pressure,continuous infusion, sonication, lipofection, nanoparticles containingthe dsDNA molecule 502 (e.g., mesoporous silica nanoparticles or goldnanoparticles) or by mixing a cationic lipid with the material toproduce liposomes which fuse with the cell membrane and deposit theircargo inside. Nanoparticles used to introduce foreign DNA may beionically charged or have targeting ligands to deliver to specific cellsor sites.

One viral transfection technique for transferring genetic material tohard-to-transfect cells is recombinant adeno-associated virus (AAV)delivery. This is a type of viral transduction that does not integrateinto the host genome. AAV-based systems have been used successfully tointroduce the gene for S. pyogenes Cas9 (SpCas9) together with itsoptimal promoter and polyadenylation signal using the AAVpro CRISPR/Cas9Helper Free System (AAV2) available from Takara Bio USA, Inc.

Conjugation may also be used to introduce the dsDNA molecule 502 into acell. Although conjugation in nature occurs more frequently in bacteria,transfer of genetic material from bacterial to mammalian cells is alsopossible. See Waters V. L., Conjugation between bacterial and mammaliancells. 29 (4) Nature Genetics 375 (2001).

The cell 500 may also include a gene 516 under the control of a promoter518 and an operator 520. The gene 516 may encode a ssRNA sequence 522comprising a 3′-end sequence 524 and a 5′-end sequence 526. AN HDRtemplate 514 may be generated from the gene 516. In one implementation,the HDR template 514 is the ssRNA sequence 522 itself. The 3′-endsequence 524 and the 5′-end sequence 526 are complementary to one strandof a dsDNA molecule 502 over at least part of a target site 504.Homology between the 3′-end sequence 524 and the 5′-end sequence 526allows the ssRNA sequence 522 to hybridize with portions of the dsDNA oneither side of a DSB created at a cut site in the target site 504.

In implementations in which the gene 516 directly encodes the HDRtemplate 514, the gene 516 will encode a cut site 528 that may be cut byan enzyme such as the first enzyme 506. Unless protected from theenzyme, the cut site 528 in the gene 516 may be unintentionally cut whenthe enzyme contacts the gene 516.

One technique for protecting the cut site 528 from the first enzyme 506is physical separation. In a cell-free system, such as one that usesmicrofluidics, the gene 516 may be maintained in one chamber and thessRNA sequence 522 may be moved from the chamber containing the gene 516into a different chamber where the enzyme 506 is present.

Physical separation may also be used in cellular implementations. Thegene 516 and the enzyme 506 may be contained in different cellularchambers. In one implementation, the gene 516 may be in the nucleus andthe enzyme may be outside the nucleus in the cytoplasm or in anothercellular chamber. The gene 516 may remain in the nucleus if it is partof the cell's genome. A nuclear export signal (NES) may be used to keepthe enzyme, or other component of the system, out of the nucleus. A NESis a short amino acid sequence of four hydrophobic residues in a proteinthat targets it for export from the cell nucleus to the cytoplasmthrough the nuclear pore complex using nuclear transport. Similarly, anuclear localization signal (NLS) may be used to keep the enzyme in thenucleus. A NLS is an amino acid sequence that tags a protein for importinto the cell nucleus by nuclear transport. Typically, a NLS consists ofone or more short sequences of positively charged lysines or argininesexposed on the protein surface. Different nuclear localized proteins mayshare the same NLS. An NLS has the opposite function of a NES. Personsof ordinary skill in the art will be able to modify or engineer aprotein such as a nuclease or other enzyme to include a NES or a NLS.

The physical location of RNA in a cell may also be controlled. The ssRNAsequence 522 may be exported from its site of transcription in thenucleus to the cytoplasm or other destination outside the nucleus wherethe enzyme is present. RNA export is described in Sean Carmody and SusanWente, mRNA Nuclear Export at a Glance, 122 J. of Cell Science 1933(2009) and Alwin Köhler and Ed Hurt, Exporting RNA from the Nucleus tothe Cytoplasm, 8 Nature Reviews Molecular Cell Biology 761 (2007).

Splicing may be used in place of or in addition to physical separationto protect the gene 516 from being cut by the enzyme 506. In oneimplementation, the gene 516 may include a sequence with a portion thatis later removed by splicing. This additional portion changes thesequence of nucleotides in the gene 516 so that there is no cut site 528present. The ssRNA sequence 522 will becomes an HDR template 514 throughsplicing, which also introduces the cut site 528.

Alternative splicing, or differential splicing, is a regulated processduring gene expression that results in a single gene coding for multipleproteins. In this process, particular exons of a gene may be includedwithin or excluded from the final, processed mRNA produced from thatgene. Consequently, the proteins translated from alternatively splicedmRNAs will contain differences in their amino acid sequence and, often,in their biological functions. The production of alternatively splicedmRNAs is regulated by a system of trans-acting proteins that bind tocis-acting sites on the primary transcript itself. Such proteins includesplicing activators that promote the usage of a particular splice site,and splicing repressors that reduce the usage of a particular site.There are multiple types of alternative splicing including exonskipping, mutually exclusive exons, alternative donor sites, alternativeacceptor sites, and intron retention. Exon skipping is one way to causesplicing in the ssRNA sequence 522; in this case, an exon may be splicedout of the primary transcript. Persons having ordinary skill in the artwill understand how to design the gene 516 so that it includes a splicesite at a specified location. Alternative splicing may be implemented asa technique to prevent creation of a DSB in the gene 516 even if thegene 516 and enzyme 506 are not physically separated.

Self-excising elements may function similarly to splicing. The gene 516may be designed to include a region that, when transcribed into RNA,includes one or more self-excising elements. Inclusion of theself-excising elements, for example in a way that disrupts the cut site528, prevents the gene 516 from being recognized by the enzyme and theexcision converts the ssRNA sequence 522 into the HDR template 514. Onetype of self-excising elements are ribozymes, which are RNA enzymes thatfunction as reaction catalysts. Ribozymes are RNA sequences thatcatalyze a (trans-esterification) reaction to remove the ribozymesequence itself from the rest of the RNA sequence. Essentially these areconsidered introns, which are intragenic regions spliced from mRNA toproduce mature RNA with a continuous exon (coding region) sequence.Self-excising introns/ribozymes consist of group I and group II introns.Many group I introns in bacteria are known to self-splice and maintain aconserved secondary structure comprised of a paired element which uses aguanosine (GMP, GDP, or GTP) cofactor. An example of a group I intron isthe Staphylococcus phage twort.ORF143. Group I and group II introns areconsidered self-splicing because they do not require proteins toinitialize the reaction. Self-excising sequences are known and one ofordinary skill in the art will understand how to include a self-excisingsequence in the gene 516. Aspects of self-excising ribozymes are shownin In Vivo Protein Fusion Assembly Using Self Excising Ribozymeavailable at 2011.igem.org/Team:Waterloo (last visited Mar. 3, 2017).

A series of homologous bridges may also be used to generate arecombinant sequence that is the gene template for the ssRNA sequence522. The homologous bridges may be present in the DNA at various,separate locations so that the gene 516 does not include a cut site 528.This technique is also known as multi-fragment cloning or extensioncloning. The final HDR template 514 is made up of transcripts of themultiple overlapping segments. One suitable technique for combining themultiple-overlapping fragments into the HDR template 514 is Sequence andLigation-Independent Cloning (SLIC). This technique is described inMamie Li and Stephen Elledge, Harnessing Homologous Recombination invitro to Generate Recombinant DNA Via SLIC, 4 Nature Methods 250 (2007).Another suitable technique for joining multiple-overlapping fragments isprovided by Jiayuan Quan and Jingdong Tian, Circular PolymeraseExtension of Cloning of Complex Gene Libraries and Pathways, 4(7) PLoSONE e6441 (2009).

Methylation may be used to protect HDR templates from premature cuttingby restriction enzymes because some restriction enzymes do not cutmethylated DNA. Other nucleases such as Cas9 may also be prevented fromcutting by methylation of a cutting region or PAM recognition site. DNAmethylation is a process by which methyl groups are added to the DNAmolecule. Methylation can change the activity of a DNA segment withoutchanging the sequence. Two of DNA's four bases, cytosine and adenine,can be methylated. A methylase is an enzyme that recognizes a specificsequence and methylates one of the bases in or near that sequence.Methylation may be controlled by epigenetic editing using a targetingdevice that is a sequence-specific DNA binding domain which can beredesigned to recognize desired sequences. The targeting device may befused to an effector domain, which can modify the epigenetic state ofthe targeted locus. Techniques for using epigenetic editing will beunderstood by one of ordinary skill in the art. Epigenome manipulationsare described in Park, et al., The epigenome: the next substrate forengineering. 17 Genome Biology 183 (2016). HDR templates made of RNA mayalso be modified by methylation. S. Lin and R. Gregory,Methyltransferases modulate RNA stability in embryonic stem cells, 16(2)Nature Cell Biology 129 (2014).

In one implementation, the HDR template 514 is a ssDNA sequencecomplementary to the ssRNA sequence 522. The ssDNA sequence may becreated by reverse transcriptase reading (RT) the ssRNA sequence 522 andsynthesizing a complementary ssDNA sequence. RT is an enzyme used togenerate cDNA from an RNA template, a process termed reversetranscription. RT is widely used in the laboratory to convert RNA to DNAfor use in procedures such as molecular cloning, RNA sequencing, PCR,and genome analysis. RT enzymes are widely available from multiplecommercial sources. Procedures for use of RT is well known to those ofordinary skill in the art.

The 3′-end sequence 530 and the 5′-end sequence 532 of the HDR template514 are homologous to one strand of the dsDNA 502 over at least aportion of the first target site 504. The HDR template 514, in bothssDNA and ssRNA implementations, includes a middle portion 534 that,when incorporated into the dsDNA 502, acts as a record on a signaldetected by the engineered signaling pathway 536. In an implementation,the middle portion 534 also introduces another target site as describedelsewhere in this disclosure.

Enzyme 506 is illustrated here as a CRISPR/Cas complex with gRNA 508.Other types of enzymes discussed above may be used instead of theCRISPR/Cas complex. The single-stranded tail of the gRNA 508 may beextended with a sequence complementary to all or part of the HDRtemplate 514. The HDR template 514 may partially hybridize to the tailof the gRNA 508 forming a double-stranded region 538. This brings a copyof the HDR template 514 into close physical proximity with the locationof the DSB created by the CRISPR/Cas complex which can increase HDRefficiency.

The extended tail of the gRNA 508 may also be designed so that itmatches the binding domain of a transcription activator-like effector(TALE) protein. The TALE protein may also have a binding domaincomplementary to the HDR template 514. This will also bring the HDRtemplate into close proximity with the location of the DSB. The tail ofthe gRNA 508 may be extended to create regions for attachment ofmultiple copies of the HDR template 514 or TALE proteins.

TALE proteins are proteins secreted by Xanthomonas bacteria via theirtype III secretion system when the bacteria infect various plantspecies. These proteins can bind promoter sequences in the host plantand activate the expression of plant genes that aid bacterial infection.They recognize plant DNA sequences through a central repeat domainconsisting of a variable number of about 34 amino acid repeats. Thereappears to be a one-to-one correspondence between the identity of twocritical amino acids in each repeat and each DNA base in the targetsite. The most distinctive characteristic of TAL effectors is a centralrepeat domain containing between 1.5 and 33.5 repeats that are usually34 nt in length (the C-terminal repeat is generally shorter and referredto as a “half repeat”). A typical repeat sequence may be shared acrossmany TALE proteins but the residues at the 12^(th) and 13^(th) positionsare hypervariable (these two amino acids are also known as the repeatvariable diresidue or RVD). This simple correspondence between aminoacids in TAL effectors and DNA bases in their target sites makes themuseful for protein engineering applications.

Subsequent to creation of a DSB in the target site 504, the molecule 538that has hybridized to the tail of the gRNA 508 may be released. In someimplementations, introduction of a nucleotide sequence complementary tothe tail of the gRNA 508 or binding domain of the TALE protein maycompete with the attached molecule 538 and cause disassociation of theHDR template 514, TALE protein, or other molecule. This competition maycause the HDR template 514 to become available for binding to the dsDNA502 on either side of the DSB.

The cell 500 may also include one or more engineered signaling pathways536. As used herein, “engineered signaling pathway” includes any pathwayin which at least one portion of the pathway is intentionally modifiedwith molecular biology techniques to be different from the wild typepathway and a signal (intracellular or extracellular) causes a change ina rate of transcription of a gene. The engineered signaling pathway 536may induce a promoter such as the promoter 510 described above. Theengineered signaling pathway 536 may also cause a transcription factorto bind to an operator such as the operator 520 described above andprevent transcription. In one implementation, the gene affected by theengineered signaling pathway 536 may be the gene 516 that encodes forthe ssRNA sequence 522. Thus, the engineered signaling pathway 536 mayfunction to control an amount of the HDR template 514 available in thecell 500. In one implementation, the gene affected by the engineeredsignaling pathway 536 may encode for an enzyme that creates DSBs indsDNA such as enzyme 506. Thus, the number of enzymes which create DSBsin the target sites 504 may be regulated by the engineered signalingpathway 536. The engineered signaling pathway 536 may control thetranscription of genes that encode other proteins associated with HDR.

The cell 500 may include multiple different engineered signalingpathways 536 each responding to a unique signal and each promoting orrepressing expression of genes responsible for the creation of the ssRNAsequence 522 and/or enzymes 506. Thus, intracellular or extracellularsignals may be used to vary the levels of HDR templates 514 and/orenzymes 506 in the cell 500 thereby changing which target sites 504 arecut and which sequences are used to repair DSBs through HDR. Respondingby up or down regulating any of multiple promoters and/or operatorsallows the cell 500 to record a log in its DNA of events and complexinteractions of events sensed by engineered signaling pathways. In oneimplementation, the engineered signaling pathway 536 may include anexternal receptor 540 that can detect extracellular signals across amembrane 542. The membrane 542 may be a cell wall, lipid bilayer,artificial cell wall, or synthetic membrane.

The cell 500 may also include one or more additional dsDNA molecules 544that may include a second target site 546. Similar to the first dsDNAmolecule 502, the additional dsDNA molecule 544 may include only asingle instance of the second target site 546. Alternatively, theadditional dsDNA molecule 544 may include multiple copies of the sametarget site or multiple different target sites. The additional dsDNAmolecule 544 may be introduced to the cell 500 by any of the techniquesdescribed above. In some implementations, the first dsDNA molecule 502and the additional dsDNA molecule 544 may be introduced by the sameprocedure. A ratio of the first dsDNA molecule 502 and the additionaldsDNA molecule 544 in the cell 500 may be controlled by regulating therespective copies of the dsDNA molecules added to the cell 500.

The additional dsDNA molecule 544 and the second target site 546 mayhave identical or similar sequences to the first dsDNA molecule 502 andthe first target site 504. Thus, the additional dsDNA molecule 544 maybe thought of as a “copy” of the first dsDNA molecule 502 in someimplementations. This additional copy of an identical or similarmolecule may provide redundancy by creating a second log that, absenterrors, will record the same series of events in both dsDNA molecules502, 544. In one implementation, the additional dsDNA molecule 544 mayinclude a target site 546 with a different sequence than the firsttarget site 504 in the first dsDNA molecule 502. Having different targetsites 504, 546 in different dsDNA molecules 502, 544 allows forsimultaneous, or alternating, encoding of binary data in two differentencoding schemes. The two different encoding schemes may benon-overlapping or “orthogonal” so that the enzymes and HDR templatesassociated with one encoding scheme do not interact with the dsDNAmolecule used for the other encoding scheme. For example, insertion ofDNA into the first target site 504 may record the presence of signalsrelated to temperature and insertion of DNA into the second target site546 may record the presence of signals related to light levels. It isunderstood, that in actual implementation there may be many hundreds orthousands of dsDNA molecules with respective target sites. There mayalso be a corresponding number of different encoding schemes anddifferent sequences for the respective target sites for creating adetailed log of multiple different signals.

In an implementation, the additional dsDNA molecule 544 may include anoperon 548 that encodes components used for logging molecular events. Anoperon is a contiguous region of DNA that includes cis-regulatoryregions (e.g., repressors, promoters) and the coding regions for one ormore genes or functional mRNAs (e.g., siRNA, tracrRNA, gRNA, shRNA,etc). The operon 548 may be delivered in a circular vector, such as theadditional dsDNA molecule 544, or may be inserted into genomic DNA ofthe cell 500 through gene editing techniques known to those of skill inthe art. In an implementation, the operon 548 may include genes encodingall of the components used by the cell 500 for performing HDR. Thus,addition of a vector such as the dsDNA molecule 544 may enable a cell500 that includes the necessary engineered signaling pathway 536 torespond to detected signals by adding ssRNA sequence 522 into a targetsite 546 on the added dsDNA molecule 544. In this implementation, theHDR template 514, the enzyme 506, and any accessory proteins may besupplied by genes included in the operon 548. The genes in the operon548 may be under the control of a single promoter 550 and operator 552.

In an implementation, the operon 548 may include any or all of a geneencoding an HDR template 554, a gene encoding an enzyme configured tomake DSBs 556, and a gene that encodes a tracking molecule 558 (e.g.,RNA, DNA, or protein) for monitoring “state” as described below. Anoperon 548 that includes genes encoding all of the products forperforming HDR may be added to a cell-free system on a circular dsDNAmolecule 544 that also includes a target site 546 to provide completeinstructions for a molecular event logging system on one molecule.

The term “operably linked” as used herein means placing a gene under theregulatory control of a promoter, which then controls the transcriptionand optionally the translation of the gene. In the construction ofheterologous promoter/structural gene combinations, it is generallypreferred to position the genetic sequence or promoter at a distancefrom the gene transcription start site that is approximately the same asthe distance between that genetic sequence or promoter and the gene itcontrols in its natural setting; i.e. the gene from which the geneticsequence or promoter is derived. As is known in the art, some variationin this distance can be accommodated without loss of function.Similarly, the preferred positioning of a regulatory sequence elementwith respect to a heterologous gene to be placed under its control isdefined by the positioning of the element in its natural setting; i.e.,the genes from which it is derived. “Constitutive promoters” aretypically active, i.e., promote transcription, under most conditions.“Inducible promoters” are typically active only under certainconditions, such as in the presence of a given molecule factor (e.g.,IPTG) or a given environmental condition (e.g., particular CO₂concentration, nutrient levels, light, heat). In the absence of thatcondition, inducible promoters typically do not allow significant ormeasurable levels of transcriptional activity. For example, induciblepromoters may be induced according to temperature, pH, a hormone, ametabolite (e.g., lactose, mannitol, an amino acid), light (e.g.,wavelength specific), osmotic potential (e.g., salt induced), a heavymetal, or an antibiotic. Numerous standard inducible promoters are knownto one of skill in the art.

Illustrative eukaryotic promoters known to one of skill in the art arelisted below.

Primarily Promoter used for Description Additional considerations CMVGeneral Strong mammalian May contain an enhancer region. Can beexpression expression promoter silenced in some cell types. from thehuman cytomegalovirus EF1a General Strong mammalian Tends to giveconsistent expression regardless expression expression from of cell typeor physiology. human elongation factor 1 alpha SV40 General Mammalianexpression May include an enhancer. expression promoter from the simianvacuolating virus 40 PGK1 General Mammalian promoter Widespreadexpression, but may vary by cell (human or expression fromphosphoglycerate type. Tends to resist promoter down regulation mouse)kinase gene. due to methylation or deacetylation. Ubc General Mammalianpromoter As the name implies, this promoter is expression from the humanubiquitous. ubiquitin C gene human General Mammalian promoterUbiquitous. Chicken version is commonly beta actin expression from betaactin gene used in promoter hybrids. CAG General Strong hybrid ContainsCMV enhancer, chicken beta actin expression mammalian promoter promoter,and rabbit beta-globin splice acceptor. TRE General Tetracyclineresponse Typically contains a minimal promoter with expression elementpromoter low basal activity and several tetracycline operators.Transcription can be turned on or off depending on what tettransactivator is used. UAS General Drosophila promoter Requires thepresence of Gal4 gene to activate expression containing Gal4 promoter.binding sites Ac5 General Strong insect promoter Commonly used inexpression systems for expression from Drosophila Actin Drosophila. 5cgene Polyhedrin General Strong insect promoter Commonly used inexpression systems for expression from baculovirus insect cells. CaMKIIaGene Ca2+/calmodulin- Used for neuronal/CNS expression. Modulatedexpression dependent protein by calcium and calmodulin. for kinase IIpromoter optogenetics GAL1, 10 General Yeast adjacent, Can be usedindependently or together. expression divergently transcribed Regulatedby GAL4 and GAL 80. promoters TEF1 General Yeast transcription Analogousto mammalian EF1a promoter. expression elongation factor promoter GDSGeneral Strong yeast Very strong, also called TDH3 or GAPDH. expressionexpression promoter from glyceraldehyde 3- phosphage dehydrogenase ADH1General Yeast promoter for Full length version is strong with highexpression alcohol dehydrogenase expression. Truncated promoters are Iconstitutive with lower expression. CaMV35S General Strong plantpromoter Active in dicots, less active in monocots, with expression fromthe Cauliflower some activity in animal cells. Mosaic Virus Ubi GeneralPlant promoter from Gives high expression in plants. expression maizeubiquitin gene H1 small From the human May have slightly lowerexpression than U6. RNA polymerase III RNA May have better expression inneuronal cells. expression promoter U6 small From the human U6 Murine U6is also used, but may be less RNA small nuclear promoter efficient.expression

Illustrative prokaryotic promoters known to one of skill in the art arelisted below.

Primarily Promoter used for Description Expression Additionalconsiderations T7 in vitro Promoter Constitutive, but When used for invitro transcription/ from T7 requires T7 RNA transcription, the promotergeneral bacteriophage polymerase. drives either the sense OR expressionantisense transcript depending on its orientation to your gene. T7lacHigh levels Promoter Negligible basal Commonly found in pET of gene fromT7 expression when vectors. Very tightly regulated expressionbacteriophage not induced. by the lac operators. Good for plus lacRequires T7 RNA modulating gene expression operators polymerase, whichthrough varied inducer is also controlled concentrations. by lacoperator. Can be induced by IPTG. Sp6 in vitro Promoter Constitutive,but SP6 polymerase has a high transcription/ from Sp6 requires SP6 RNAprocessivity. When used for in general bacteriophage polymerase. vitrotranscription, the promoter expression drives either the sense ORantisense transcript depending on its orientation to your gene. araBADGeneral Promoter of Inducible by Weaker. Commonly found in expressionthe arabinose arabinose and pBAD vectors. Good for rapid metabolicrepressed regulation and low basal operon catabolite expression;however, not well- repression in the suited for modulating gene presenceof expression through varied glucose or by inducer concentrations.competitive binding of the anti-inducer fucose trp High levels PromoterRepressible Gets turned off with high levels of gene from E. coli ofcellular tryptophan. expression tryptophan operon lac General PromoterConstitutive in the Leaky promoter with somewhat expression from lacabsence of lac weak expression, lacIq operon repressor (lacI or mutationincreases expression lacIq). Can be of the repressor 10x, thus inducedby IPTG tightening regulation of lac or lactose. promoter. Good formodulating gene expression through varied inducer concentrations. PtacGeneral Hybrid Regulated like the Contains −35 region from trpBexpression promoter of lac promoter and −10 region from lac. Very lacand trp tight regulation. Good for modulating gene expression throughvaried inducer concentrations. Generally better expression than lacalone. pL High levels Promoter Can be Often paired with the of gene fromtemperature temperature sensitive cI857 expression bacteriophageregulatable repressor. lambda

FIG. 6 shows a diagram 600 illustrating insertion of a first HDRtemplate into a gene 602. The gene 602 can include a target site 604.The target site 604 can include a sequence of nucleotides that candirect an enzyme (not shown) to create a DSB in the gene 602 within thetarget site 604 at a cut site 606. The target site 604 can, in somecases, be part of a pre-existing sequence of nucleotides that isrecognized by one or more enzymes to create the DSB. In othersituations, the target site 604 can be added to the gene 602 byconventional genetic engineering techniques such that the DSB can beproduced by one or more enzymes. Additionally, the gene 602 can includea single target site 604 in some implementations, while in other cases(not shown), the gene 602 can include multiple target sites 604. Theenzyme used to create the DSB can include enzymes described previouslyin this application, such as restriction enzymes, homing endonucleases,zinc-finger nucleases, transcription activator-like effector nucleases,CRISPR/Cas, and NgAgo.

The DSB produced by the enzyme in the target site 604 produces a gap 608and two subsequences 602(A) and 602(B) on either side of the gap 608. Invarious implementations, the target site 604 can include from about 10nucleotides to about 40 nucleotides with each of the subsequences 602(A)and 602(B) having from about 5 nucleotides to about 20 nucleotidesdepending on the location of the cut site 606 within the target site604. In some examples, the cut site 604 can be located in a middleportion of the target site 604. Alternatively, the cut site 604 can beincluded closer to the 3′ end of the target site 604 or closer to the 5′end of the target site 604. The subsequences 602(A) and 602(B) caninclude the same sequences of nucleotides in particular implementations,but different sequences of nucleotides in additional implementations.

After the gap 608 is created by the DSB, a first HDR template 610 movesinto proximity with the subsequences 602(A) and 602(B) and the gap 608.As described previously in this application, the first HDR template 610can be a single strand of DNA or a single strand of RNA that is used torepair the DSB through homologous directed repair. A 3′-end sequence610(A) of the first HDR template 610 can be complementary to the firstsubsequence 602(A) and a 5′-end sequence 610(B) of the first HDRtemplate 610 can be complementary to the second subsequence 602(B). The3′-end 610(A) and the 5′-end 610(B) can also have a length that issimilar to or the same as the lengths of the first subsequence 602(A)and the second subsequence 602(B). Accordingly, the 3′-end sequence610(A) and the 5′-end sequence 610(B) can include about 5 nucleotides toabout 20 nucleotides.

Between the 3′-end sequence 610(A) and the 5′-end sequence 610(B), thefirst HDR template 610 can include middle portion 612 that includes afirst splicing region 614, a barcode sequence 616, and a second splicingregion 618. The first splicing region 614 can include a sequence ofnucleotides that is recognized by an enzyme that can create a cut withinthe first splicing region 614. Additionally, the second splicing region618 can include a sequence of nucleotides that is recognized by anenzyme that can create a cut within the second splicing region 618. Insome implementations, the first splicing region 614 and the secondsplicing region 618 can include sequences of nucleotides that arerecognized by a spliceosome. The spliceosome can create cuts at specificlocations within the first splicing region 614 and the second splicingregion 618. In an illustrative example, the first splicing region 614can be an acceptor site of an intron and include an AG sequence thatindicates a first cut site for a spliceosome. The first splicing region614 can include a region that is high in pyrimidines, such as apolypyrimidine region. Additionally, the first splicing region 614 caninclude a branch sequence. The branch sequence can be from 20 to 50nucleotides away (i.e., toward the 5′-end) from the 3′-end of the HDRtemplate and include at least one adenine along with pyrimidines, and atleast one additional purine. The second splicing region 618 can be adonor site of an intron and include a GU sequence that indicates asecond cut site for a spliceosome in addition to additional purines andpyrimidines.

The barcode sequence 616 can include a number of nucleotides thatcomprise a sequence that corresponds to the gene 602. In someimplementations, the barcode sequence 616 can uniquely correspond to thegene 602. That is, for each gene for which its expression is beinganalyzed within a given group of genes, a unique barcode sequence can beidentified. The barcode sequence 616 can include any number ofnucleotides that allows for identification of the gene such as, forexample, at least 20 nucleotides, at least 50 nucleotides, at least 75nucleotides, or at least 100 nucleotides. In some illustrative examples,the barcode sequence 616 can include from about 20 nucleotides to about250 nucleotides, from about 20 nucleotides to about 100 nucleotides,from about 50 nucleotides to about 150 nucleotides, or from about 100nucleotides to about 200 nucleotides.

As the first HDR template 610 moves into proximity with the firstsubsequence 602(A) and the second subsequence 602(B), HDR can repair theDSB and produce a modified gene 620 from the gene 602. As explainedpreviously with respect to FIG. 1 and FIG. 2, the first HDR template 610can displace one strand of the first subsequence 602(A) and the secondsubsequence 602(B) and pair with the other strand of the firstsubsequence 602(A) and the second subsequence 602(B) through theformation of a D loop and using DNA ligase. Once the first HDR template610 is used to repair the DSB of a first strand of the gene 602, DNApolymerase can be utilized to produce a number of nucleotidescomplementary to those of the middle portion, thus repairing the secondstrand of the gene 602 at the DSB to produce dsDNA that is the modifiedgene 620. The middle portion 612 of the first HDR template 610 can beused to produce a gene expression region 622 of the modified gene 620that includes at least the first splicing region 614, the barcodesequence 616, and the second splicing region 618.

FIG. 7 shows a diagram 700 illustrating the splicing of a second HDRtemplate including a barcode sequence from an RNA precursor producedfrom the modified gene 620. The modified gene 620 can be under thecontrol of a promoter 702 and an operator 704. As explained previouslywith respect to FIG. 3, the promoter 702 can be used to implement theexpression of the modified gene 620 and the operator 704 can turn offthe expression of the modified gene 620. The operator 704 can bedeactivated and the promoter 702 can be activated using a signalingpathway that is activated in response to a stimulus. The stimulus caninclude one or more of the presence of a molecule, such as a protein orenzyme, the absence of a molecule, or a condition to which the modifiedgene 620 is exposed. In some cases, the modified gene 620 can be exposedto a condition that affects the activation of the promoter 702 and/orthe operator 704, such as a temperature range, a pH range, exposure to arange of electromagnetic radiation, and the like.

In response to being activated, the modified gene 620 can produce a geneproduct 706. In the illustrative example of FIG. 7, the gene product 706is an RNA precursor. In some implementations, the RNA precursor can bean mRNA precursor. The gene product 706 can have a structure thatincludes a 5′ UTR 708, a coding region 710, and a 3′ UTR 712. An exampleportion 714 of the coding region 710 can include a first intron 716, afirst exon, 718, the gene expression region 622, a second exon 720, anda second intron 722.

The gene product 706 can be contacted with an enzyme 724 that removesportions of the sequence of the gene product 706. For example, theenzyme 724 can include a spliceosome that removes introns from an mRNAprecursor. In the illustrative example of FIG. 7, the enzyme 724 is usedto remove the gene expression region 622 from the gene product 706. Invarious implementations, the gene expression region 622 can include asequence of nucleotides that is recognized by the enzyme 724. Inparticular implementations, the first splicing region 614 and the secondsplicing region 618 can include nucleotide sequences that are recognizedby the enzyme 724 such that the enzyme 724 can cut the gene expressionregion 622 at both the first splicing region 614 and the second splicingregion 618. In an illustrative example, the gene expression region 622can be designed so that the first splicing region 614 and the secondsplicing region 618 are the same as or similar to splicing regions thatare recognized by one of the many spliceosomes utilized to spliceintrons from mRNA precursors. Additionally, the barcode sequence 616 ofthe gene expression region 622 can also include a nucleotide sequencethat does not interfere with the splicing actions performed by theenzyme 724. In certain situations, the gene expression region 622 caninclude a nucleotide sequence that corresponds partially to one or moresequences of known introns that can be spliced by the enzyme 724.

The splicing of the gene expression region 622 by the enzyme 724 canproduce a second HDR template 726. The second HDR template 726 caninclude a first end region 728, the barcode sequence 616, and a secondend region 730. In some cases, the first end region 728 can include atleast part of the nucleotide sequence that comprises the first splicingregion 614 and the second end region 730 can include at least part ofthe nucleotide sequence that comprises the second splicing region 618.In particular implementations, the first end region 728 can include thenucleotide sequence of the first splicing region 614 minus one or morenucleotides removed by the enzyme 724. Additionally, the second endregion 730 can include the nucleotide sequence of the second splicingregion 618 minus one or more nucleotides removed by the enzyme 724.

FIG. 8 shows a diagram 800 illustrating insertion of the second HDRtemplate 726 into an additional polynucleotide 802. The additionalpolynucleotide 802 can be dsDNA. In some cases, the additionalpolynucleotide 802 can include genomic DNA inside a living prokaryoticor eukaryotic cell. In other situations, the additional polynucleotide802 can include dsDNA introduced into a living cell, such as a plasmidor vector. In still other examples, the additional polynucleotide 802can include dsDNA in a cell-free system. The additional polynucleotide802 can include linear or circular dsDNA prior to undergoing an HDRoperation. The additional polynucleotide 802 can have a sequence that isdifferent from the sequence of the gene 602.

The additional polynucleotide 802 can include a target site 804. Thetarget site 804 can include a sequence of nucleotides that can direct anenzyme (not shown) to create a DSB in the additional polynucleotide 802within the target site 804 at a cut site 806. The target site 804 can,in some cases, be part of a pre-existing sequence of nucleotides that isrecognized by one or more enzymes to create the DSB. In othersituations, the target site 804 can be added to the additionalpolynucleotide 802 by conventional genetic engineering techniques suchthat the DSB can be produced by one or more enzymes. Additionally, theadditional polynucleotide 802 can include a single target site 804 insome implementations, while in other cases (not shown), the additionalpolynucleotide 802 can include multiple target sites 804. The enzymeused to create the DSB can include enzymes described previously in thisapplication, such as restriction enzymes, homing endonucleases,zinc-finger nucleases, transcription activator-like effector nucleases,CRISPR/Cas, and NgAgo.

The DSB produced by the enzyme in the target site 804 produces a gap 808and two subsequences 802(A) and 802(B) on either side of the gap 808. Invarious implementations, the target site 804 can include from about 10nucleotides to about 40 nucleotides with each of the subsequences 802(A)and 802(B) having from about 5 nucleotides to about 20 nucleotidesdepending on the location of the cut site 806 within the target site804. In some examples, the cut site 806 can be located in a middleportion of the target site 804. Alternatively, the cut site 806 can beincluded closer to the 3′-end of the target site 804 or closer to the5′-end of the target site 804. The subsequences 802(A) and 802(B) caninclude the same sequences of nucleotides in particular implementations,but different sequences of nucleotides in additional implementations.

After the gap 808 is created by the DSB, the second HDR template 726moves into proximity with the subsequences 802(A) and 802(B) and the gap808. As described previously in this application, the second HDRtemplate 726 can be a single stranded polynucleotide sequence that isused to repair the DSB through homologous directed repair. The first endregion 728 can be complementary to the first subsequence 802(A) and thesecond end region 730 can be complementary to the second subsequence802(B). The first end region 728 and the second end region 730 can alsohave a length that is similar to or the same as the lengths of the firstsubsequence 802(A) and the second subsequence 802(B). Accordingly, thefirst end region 728 and the second end region 730 can include about 5nucleotides to about 20 nucleotides. Between the first end region 728and the second end region 730, the second HDR template 726 includes thebarcode sequence 616 that includes a sequence of nucleotides thatcorresponds to the gene 602.

As the second HDR template 726 moves into proximity with the firstsubsequence 802(A) and the second subsequence 802(B), HDR can be used torepair the DSB. In some cases, uptake of the second HDR template 726 bythe additional polynucleotide 802 can depend on the length of time thatthe second HDR template 726 remains viable in the cell and on theconcentration of the additional polynucleotide 802 in the cell. Thelength of time that the second HDR template 726 remains viable in thecell can be based on certain conditions of the cell, such as pH,temperature, and the presence or absence of enzymes or proteins that mayfacilitate the degradation of the second HDR template 726. As one ofordinary skill in the art will appreciate, the conditions andconstituents of a cell can be optimized such that the concentration ofthe additional polynucleotide 802 and the length of time that the secondHDR template 726 remains viable in the cell enable the second HDRtemplate 726 to move into proximity with the first subsequence 802(A)and the second subsequence 802(B). Additionally, the sequence of thesecond homologous template 726 and the environment in which the secondhomologous template 726 and the additional polynucleotide are locatedcan be designed such that the second homologous template 726 can remainviable in a cell for a length of time to move into proximity with anadditional polynucleotide 802 that has undergone a DSB as understood bythose of ordinary skill in the art and described in Clement, Jade Q.,Sourindra Maiti, and Wilkinson, Miles F., Localization and Stability ofIntrons Spliced from the Pem Homeobox Gene, 276 The Journal ofBiological Chemistry, 16919-16930 (May 18, 2001) and Hesselberth Jay R.Lives that introns lead after splicing, WIREs RNA 2013, 4: 677-691. doi:10.1002/wrna.1187.

Performing HDR with the second HDR template 726, the first subsequence802(A) and the second subsequence 802(B) can produce a new doublestranded polynucleotide 810. As explained previously with respect toFIG. 1 and FIG. 2, the second HDR template 726 can displace one strandof the first subsequence 802(A) and the second subsequence 802(B) andpair with the other strand of the first subsequence 802(A) and thesecond subsequence 802(B) through the formation of a D loop and usingDNA ligase. Once the second HDR template 726 is used to repair the DSBof a first strand of the additional polynucleotide 802, DNA polymerasecan be utilized to produce a number of nucleotides complementary tothose of the barcode sequence 616, thus repairing the second strand ofthe additional polynucleotide 802 at the DSB to produce the new doublestranded polynucleotide 810. The new double stranded polynucleotide 810can include a middle portion 812 that includes at least the barcodesequence 616. In some cases, the middle portion 812 can also include anumber of nucleotides corresponding to the first end region 728 and/orthe second end region 730. After producing the new double strandedpolynucleotide 810, the new double stranded polynucleotide 810 can besequenced. The sequencing of the new double stranded polynucleotide 810can reveal the barcode sequence 616 in the middle portion 812 of the newdouble stranded polynucleotide 810 indicating the expression of the gene602.

FIG. 9 shows a diagram 900 illustrating joining a first HDR template anda second HDR template to produce a third HDR template using an RNAsubstrate. In particular, a gene 902 can be under the control of apromoter 904 and an operator 906. As explained previously with respectto FIG. 3, the promoter 904 can be used to implement the expression ofthe gene 902 and the operator 906 can turn off the expression of thegene 902. The operator 906 can be deactivated and the promoter 904 canbe activated using a signaling pathway that is activated in response toa stimulus. The stimulus can include one or more of the presence of amolecule, such as a protein or enzyme, the absence of a molecule, or acondition to which the gene 902 is exposed. In some cases, the gene 902can be exposed to a condition that affects the activation of thepromoter 904 and/or the operator 906, such as a temperature range, a pHrange, exposure to a range of electromagnetic radiation, and the like.

In response to being activated, the gene 902 can produce a gene product.In the illustrative example of FIG. 9, the gene product is an mRNAstrand 908. The mRNA strand 908 can include a first portion 910, labeledas ‘A1’ in FIG. 9, and a second portion 912, labeled as ‘A2’ in FIG. 9.Additionally, a first HDR template 914 can be provided that includes afirst region 916 that is homologous to the first portion 910 of the mRNAstrand 908. The first region 916 of the first HDR template 914 islabeled “A1′” in FIG. 9. The first region 916 of the first HDR template914 can have from 5 nucleotides to 75 nucleotides, from 10 nucleotidesto 40 nucleotides, or from 20 nucleotides to 50 nucleotides. The firstHDR template 914 can also include a remainder region 918. The remainderregion 918 of the first HDR template 914 can have from 10 nucleotides to40 nucleotides. In addition, the remainder region 918 can include asection that can be used in an HDR process. That is, in some cases, atleast a portion of the remainder region 918 can be homologous to atarget site of a polynucleotide utilized in HDR.

Further, a second HDR template 920 can be provided that includes a firstregion 922 that is homologous to the second portion 912 of the mRNAstrand 908. The first region 922 of the of the second HDR template 920is labeled as “A2′” in FIG. 9. The first region 922 of the second HDRtemplate 920 can have from 5 nucleotides to 75 nucleotides, from 10nucleotides to 40 nucleotides, or from 20 nucleotides to 50 nucleotides.The second HDR template 920 can also include a remainder region 924. Theremainder region 924 of the second HDR template 920 can have from 10nucleotides to 40 nucleotides. In addition, the remainder region 924 caninclude a section that can be used in an HDR process. That is, in somecases, at least a portion of the remainder region 924 can be homologousto a target site of a polynucleotide utilized in HDR. In some particularimplementations, at least one of the first remainder region 918 or thesecond remainder region 924 can include a target region that can serveas an insertion site for an HDR operation.

In the illustrative example of FIG. 9, the first HDR template 914 canmove to be proximate to the first portion 910 of the mRNA strand 908 andthe second HDR template 920 can move to be proximate to the secondportion 912 of the mRNA strand 908. Additionally, a 5′ end of the firstHDR template 914 can move to be proximate to a 3′ end of the second HDRtemplate 920. As the first region 916 of the first HDR template 914becomes close enough to the first portion 910 of the mRNA strand 908,the first region 916 can anneal to the first portion 910. Also, as thefirst region 922 of the second HDR template 920 becomes close enough tothe first portion 912 of the mRNA strand 908, the first region 922 cananneal to the first portion 912. Further, the 5′ end of the first HDRtemplate 914 can be joined to the 3′ end of the second HDR template 920.In particular implementations, a ligase can be utilized to join the 5′end of first HDR template 914 with the 3′ end of the second HDR template920. Thus, a modified mRNA strand 908 can be produced that includes adouble stranded region 928. Further, joining the 5′ end of the first HDRtemplate 914 to the 3′ end of the second HDR template 920 can produce athird HDR template 930. The third HDR template 930 can include the firstregion 916 and the remainder region 918 of the first HDR template 914and the first region 922 and the remainder region 924 of the second HDRtemplate 920.

FIG. 10 shows a diagram 1000 illustrating insertion of a portion of thethird HDR template 930 into an additional polynucleotide 1002. Theadditional polynucleotide 1002 can be dsDNA. In some cases, theadditional polynucleotide 1002 can include genomic DNA inside a livingprokaryotic or eukaryotic cell. In other situations, the additionalpolynucleotide 1002 can include dsDNA introduced into a living cell,such as a plasmid or vector. In still other examples, the additionalpolynucleotide 1002 can include dsDNA in a cell-free system. Theadditional polynucleotide 1002 can include linear or circular dsDNAprior to undergoing an HDR operation.

The additional polynucleotide 1002 can include a target site 1004. Thetarget site 1004 can include a sequence of nucleotides that can directan enzyme (not shown) to create a DSB in the additional polynucleotide1002 within the target site 1004 at a cut site 1006. The target site1004 can, in some cases, be part of a pre-existing sequence ofnucleotides that is recognized by one or more enzymes to create the DSB.In other situations, the target site 1004 can be added to the additionalpolynucleotide 1002 by conventional genetic engineering techniques suchthat the DSB can be produced by one or more enzymes. Additionally, theadditional polynucleotide 1002 can include a single target site 1004 insome implementations, while in other cases (not shown), the additionalpolynucleotide 1002 can include multiple target sites 1004. The enzymeused to create the DSB can include enzymes described previously in thisapplication, such as restriction enzymes, homing endonucleases,zinc-finger nucleases, transcription activator-like effector nucleases,CRISPR/Cas, and NgAgo.

The DSB produced by the enzyme in the target site 1004 produces a gap1008 and two subsequences 1002(A) and 1002(B) on either side of the gap1008. In various implementations, the target site 1004 can include fromabout 10 nucleotides to about 40 nucleotides with each of thesubsequences 1002(A) and 1002(B) having from about 5 nucleotides toabout 20 nucleotides depending on the location of the cut site 1006within the target site 1004. In some examples, the cut site 1006 can belocated in a middle portion of the target site 1004. Alternatively, thecut site 1006 can be included closer to the 3′-end of the target site1004 or closer to the 5′-end of the target site 1004. The subsequences1002(A) and 1002(B) can include the same sequences of nucleotides inparticular implementations, but different sequences of nucleotides inadditional implementations.

After the gap 1008 is created by the DSB, the third HDR template 930moves into proximity with the subsequences 1002(A) and 1002(B) and thegap 1008. As described previously in this application, the third HDRtemplate 930 can be a polynucleotide sequence that is used to repair theDSB through homologous directed repair. The remainder region 918 can becomplementary to the first subsequence 1002(A) and the remainder region924 can be complementary to the second subsequence 1002(B). Theremainder region 918 and the remainder region 924 can also have a lengththat is similar to or the same as the lengths of the first subsequence1002(A) and the second subsequence 1002(B). Between the remainder region918 and the remainder region 924, the third HDR template 930 includes abarcode region 1010 that includes a sequence of nucleotides thatcorresponds to the gene 902. In some cases, the barcode region 1010 canuniquely identify the gene 902. The barcode region 1010 can be comprisedof the first region 916 of the first HDR template 914 and the firstregion 922 of the second HDR template 920.

As the third HDR template 930 moves into proximity with the firstsubsequence 1002(A) and the second subsequence 1002(B), HDR can be usedto repair the DSB. In some cases, uptake of the third HDR template 930by the additional polynucleotide 1002 can depend on the length of timethat the third HDR template 930 remains viable in the cell and on theconcentration of the additional polynucleotide 1002 in the cell. Thelength of time that the third HDR template 930 remains viable in thecell can be based on certain conditions of the cell, such as pH,temperature, and the presence or absence of enzymes or proteins that mayfacilitate the degradation of the third HDR template 930. As one ofordinary skill in the art will appreciate, the conditions andconstituents of a cell can be optimized such that the concentration ofthe additional polynucleotide 1002 and the length of time that the thirdHDR template 930 remains viable in the cell enable the third HDRtemplate 930 to move into proximity with the first subsequence 1002(A)and the second subsequence 1002(B). Additionally, the sequence of thethird HDR template 930 and the environment in which the third HDRtemplate 930 and the additional polynucleotide 1002 are located can bedesigned such that the third HDR template 930 can remain viable in acell for a length of time to move into proximity with an additionalpolynucleotide 1002 that has undergone a DSB as understood by those ofordinary skill in the art and described in Clement, Jade Q., SourindraMaiti, and Wilkinson, Miles F., Localization and Stability of IntronsSpliced from the Pem Homeobox Gene, 276 The Journal of BiologicalChemistry, 16919-16930 (May 18, 2001) and Hesselberth Jay R. Lives thatintrons lead after splicing, WIREs RNA 2013, 4: 677-691. doi:10.1002/wrna.1187.

Performing HDR with the third HDR template 930, the first subsequence1002(A) and the second subsequence 1002(B) can produce a new doublestranded polynucleotide 1012. As explained previously with respect toFIG. 1 and FIG. 2, the third HDR template 930 can displace one strand ofthe first subsequence 1002(A) and the second subsequence 1002(B) andpair with the other strand of the first subsequence 1002(A) and thesecond subsequence 1002(B) through the formation of a D loop and usingDNA ligase. Once the third HDR template 930 is used to repair the DSB ofa first strand of the additional polynucleotide 1002, DNA polymerase canbe utilized to produce a number of nucleotides complementary to those ofthe barcode region 1010, thus repairing the second strand of theadditional polynucleotide 1002 at the DSB to produce the new doublestranded polynucleotide 1012. The new double stranded polynucleotide1012 can include a middle portion 1014 that includes at least thebarcode region 1010. In some cases, the middle portion 1014 can alsoinclude a number of nucleotides corresponding to the remainder region918 and/or the remainder region 924. After producing the new doublestranded polynucleotide 1012, the new double stranded polynucleotide1012 can be sequenced. The sequencing of the new double strandedpolynucleotide 1012 can reveal the barcode region 1010 in the middleportion 1014 of the new double stranded polynucleotide 1012 indicatingthe expression of the gene 902.

Although not shown in the illustrative example of FIG. 10, the third HDRtemplate 930 can still be joined to the first portion 910 and the secondportion 912 of the RNA strand 908 as the third HDR template 930 beginsto join with portions of the first subsequence 1002(A) and the secondsubsequence 1002(B). In some cases, the third HDR template 930 can beseparated from the RNA strand 908 as the remainder region 918 and theremainder region 924 begin to join with the first subsequence 1002(A)and the second subsequence 1002(B), respectively. In other situations,the third HDR template 930 can be separated from the RNA strand 908during translation of the RNA strand. In particular instances, the thirdHDR template 930 can be removed from the RNA strand 908 before the HDRprocess begins. In still other implementations, the third HDR template930 can be separated from the RNA strand 908 as a polymerase producesthe second strand of the double stranded polynucleotide 1012 that iscomplementary to the barcode region 1010 of the third HDR template 930.

Illustrative Processes

For ease of understanding, the processes discussed in this disclosureare delineated as separate operations represented as independent blocks.However, these separately delineated operations should not be construedas necessarily order dependent in their performance. The order in whicha process is described is not intended to be construed as a limitation,and any number of the described process blocks may be combined in anyorder to implement the process, or an alternate process. Moreover, it isalso possible that one or more of the provided operations may bemodified or omitted.

FIG. 11 shows an illustrative process 1100 for identifying theexpression of a gene by sequencing DNA that includes a barcode sequencecorresponding to the gene.

At 1102, the process 1100 includes producing a first HDR templateincluding a first splicing region and a barcode region. The barcoderegion can include a nucleotide sequence that corresponds to the gene.For example, the sequence of the barcode can be used to specificallyidentify the gene. That is, identifying the presence of the barcodesequence in a polynucleotide can provide an indication of the expressionof the gene. In some cases, the gene can be one of a plurality of genesand individual barcode sequences can be produced that correspond to theindividual genes.

In particular implementations, data can be generated by one or morealgorithms implemented by a computing device that indicates a number ofbarcode sequences and individual barcode sequences can be arbitrarilyassociated with each gene. The one or more algorithms can take intoconsideration one or more criteria in order to generate the barcodesequences. To illustrate, the barcode sequences can be generated basedon a particular range of lengths for the barcode sequences, such as from50 nucleotides to 500 nucleotides, from 50 nucleotides to 250nucleotides, or from 100 nucleotides to 200 nucleotides. In anotherillustrative example, the barcode sequences can be generated based onthe stability of the barcode sequence within an environment. In certainsituations, the environment can include a cell subjected to a set ofconditions, such as a temperature range, a pH, and the like. In variousimplementations, the barcode sequences can be generated based onstability of the barcode sequence in a polynucleotide that also includesother sequences, such as one or more splicing regions, in anenvironment. The barcode sequences can also be generated withconsideration of their behavior as single-stranded polynucleotides suchthe ability to form secondary structures like hairpin loops.

Additionally, the splicing region can include a sequence of nucleotidesthat is recognized by an enzyme to produce a cut in the splicing region.The enzyme can include a spliceosome that can identify a configurationof nucleotides and produce a cut at a specific location within thesplicing region. The sequence of nucleotides for the splicing region canbe generated by one or more computer-implemented algorithms. The one ormore computer-implemented algorithms can take into considerationinformation known by one of ordinary skill in the art regardingsequences that are recognized by a number of spliceosomes and utilizethe information to generate the sequence of the splicing region. Forexample, information known to those of ordinary skill in the art canindicate that a particular location of a splicing region can include anypurine, and the one or more algorithms can be implemented to includeadenine or guanine at the particular location. In another example, theone or more algorithms may not be flexible in determining a nucleotideat a location of a splicing sequence where the information known tothose skilled in the art indicates that an adenine is to be present atthe location.

In some cases, the first HDR template can include multiple splicingregions. In situations where the first HDR template is located at an endof the gene, such as the 3′ UTR, a single splicing region can beincluded in the first HDR template because a cut at the splicing regioncan be sufficient to free the first HDR template from a product producedfrom the gene, such as an mRNA precursor. In other situations, the firstHDR template can be located within a coding portion of the gene. Inthese situations, the first HDR template can include multiple splicingregions. Each of the splicing regions can include a sequence ofnucleotides known to those skilled in the art to be recognized by aspliceosome to produce a cut at each of the splicing regions. Inparticular implementations, the first HDR template can be inserted intothe gene as an intron. In an illustrative example, the first HDRtemplate can include two splicing regions with the barcode regionlocated between the splicing regions.

At 1104, the process 1100 includes inserting the first HDR template intoa target site of a gene using homologous directed repair. In particular,an enzyme, such as a nuclease can be utilized to create a DSB in atarget site of the gene. At least a portion of the splicing region orregions can be homologous to a corresponding portion of the gene at thecut site. In some implementations, a portion of the barcode sequence canbe homologous to a corresponding portion of the gene at the cut site. Inimplementations where the first HDR template includes a barcode regionlocated between two splicing regions, at least a portion of a firstsplicing region can be homologous to a first portion of a target site ofthe gene located on a first side of the DSB and at least a portion ofthe second splicing region can be homologous to a second portion of thetarget site located on a second side of the DSB that is situatedopposite the DSB to the first side. Homology directed repair can be usedto insert the first HDR template into the target site of the gene.

In various implementations, the target site of the gene can be a regionthat is naturally occurring in the gene. In other implementations, thetarget site can be inserted into the gene through HDR. That is, a HDRtemplate that includes a sequence of the target site can be insertedinto the gene before the first HDR template including the barcode regionis inserted into the gene.

At 1106, the process 1100 includes removing the first HDR template froman RNA precursor using an enzyme to produce a second HDR template. Inparticular implementations, expression of the gene can take place inresponse to one or more signals. The one or more signals can be relatedto an environment of gene. For example, the one or more signals can berelated to a temperature, a pH, the presence of a protein, the presenceof an enzyme, or combinations thereof. As the gene is expressed, an RNAprecursor can be formed before RNA, such as mRNA, is produced that canbe utilized to form a protein or other product encoded by the gene. TheRNA precursor can include a 5′ UTR, a 3′ UTR, and a coding region thatincludes introns and exons. The introns and the first HDR template canbe removed from the RNA precursor by spliceosomes that recognizesplicing sequences within the RNA precursor and make cuts within thevarious splicing sequences.

The action of an enzyme to cut the first HDR template at the firstsplicing region can produce the second HDR template, which includes thebarcode region and at least a portion of the first splicing region sincesome amount of the first splicing region can be left behind after thecut made by the enzyme. Additionally, when the first HDR templateincludes a second splicing region, the second HDR template can includeat least a portion of the second splicing region.

The sequence of the second HDR template can be designed such that thesecond HDR template remains viable in the environment for a specifiedperiod of time. In some implementations, the sequence of the secondhomologous template can be designed using one or more algorithmsimplemented by a computing device and relying on knowledge available toone of ordinary skill in the art. For example, the one or morealgorithms can utilize knowledge of one of ordinary skill in the artregarding the viability of introns in certain environments and generatea sequence for the second homologous template that is likely to remainviable in an environment for the specified period of time.

At 1108, the process 1100 includes inserting the second HDR templateinto a section of an additional polynucleotide using homology directedrepair to produce a modified double stranded polynucleotide. The secondHDR template can be inserted into a section of the additionalpolynucleotide by bringing the second HDR template in contact with theadditional polynucleotide. The section of the additional polynucleotidethat the second HDR template is inserted into can be a target site thatincludes a cut site. A DSB can be created at the cut site using anenzyme, such as a nuclease. The additional double strandedpolynucleotide can include genomic DNA or artificial DNA. Also, in somecases, the additional double stranded polynucleotide can include linearDNA or circular DNA before the insertion of the second HDR template intothe target site.

In particular implementations, the second HDR template can include afirst portion that is homologous to a first section of the target siteof the additional double stranded polynucleotide that is on one side ofthe DSB and a second portion that is homologous to a second section ofthe target site that is on the other side of the DSB. In some cases, thefirst portion of the second HDR template can include at least part ofthe first splicing region. In various implementations, the first portionof the second HDR template can also include a portion of the sequence ofthe barcode region. In situations where the second HDR template isformed using only a single splicing region, the second portion of thesecond HDR template can be comprised of a portion of the sequence of thebarcode region. In examples where the second HDR template is formedusing two splicing regions, the second portion of the second HDRtemplate can be comprised of a portion of a second splicing region.Additionally, in some cases where the second HDR template is formedusing two splicing regions, the second portion of the second HDRtemplate can be comprised of a portion of the second splicing region anda portion of the barcode region.

In some cases, the additional double stranded polynucleotide can includemultiple target sites. A first target site can be utilized to insert thesecond HDR template into the additional double stranded polynucleotide.Additionally, a second target site can be utilized to insert sequencescorresponding to other indicators into the additional double strandedpolynucleotide. For example, a second target site can be utilized toinsert a timing indicator into the additional double strandedpolynucleotide. To illustrate, a signal associated with a particulartime can be generated and cause an enzyme to create a DSB at the secondtarget site. Also, a HDR template that corresponds to the timing eventcan be brought into contact with the second target site and be insertedinto the additional double stranded polynucleotide using HDR. In thisway, a timing related to the insertion of the second homologous templateinto the additional double stranded polynucleotide can be recorded inthe additional double stranded polynucleotide. The insertion of timingindicators into the additional double stranded polynucleotide can beperformed according to the techniques described in U.S. Pat. No.10,892,034 entitled “Timing of Logged Molecular Events,” which isincorporated by reference herein in its entirety.

At 1110, the process 1100 includes sequencing the modified doublestranded polynucleotide to produce sequencing data. The sequencing ofthe modified double stranded polynucleotide can be performed by anypolynucleotide sequencing technique known to those of skill in the art.The sequencing data can include information indicating the nucleotidespresent at the various positions of the modified double strandedpolynucleotide.

At 1112, the process 1100 includes determining that the gene has beenexpressed based at least partly on identifying the barcode sequence inthe sequencing data. In particular, the sequencing data can be comparedwith a record of the barcode sequence. In response to determining thatthe modified double stranded polynucleotide includes the barcodesequence or includes substantially all of the barcode sequence based onthe comparison, the expression of the gene can be identified. This isbecause insertion of the barcode sequence into the additional doublestranded polynucleotide occurs as a result of the expression of the genethrough the gene expression making the second HDR template including thebarcode sequence available to be added to the additional double strandedpolynucleotide.

FIG. 12 shows an additional illustrative process 1200 for identifyingthe expression of a gene by sequencing DNA that includes a barcodesequence corresponding to the gene.

At 1202, the process 1200 includes producing a first HDR templateincluding a region complementary to a first portion an RNA strandproduced from the expression of the gene. In some cases, the RNA caninclude mRNA that is produced during the expression of the gene. Thefirst HDR template can include a portion that is a first part of abarcode sequence that can be utilized to identify the gene. The firstpart of the barcode sequence can correspond to a section of the firstHDR template that is complementary to a first portion of the RNA strand.In this way, the portion of the first HDR template corresponding to thefirst part of the barcode sequence can joined to the first portion ofthe RNA strand. Additionally, the first HDR template can include otheruseful sequences. For example, the first HDR template can include atarget region that can be utilized as an insertion region in an HDRoperation. Further, the first HDR template can include a region that ishomologous to a portion of an insertion site of a polynucleotide that isutilized in an HDR process.

At 1204, the process 1200 can produce a second HDR template including aregion complementary to a second portion of the RNA strand. The secondHDR template can include a portion that is a second part of a barcodesequence that can be utilized to identify the gene. The second part ofthe barcode sequence can be a section of the second HDR template that iscomplementary to a second portion of the RNA strand. In this way, theportion of the second HDR template corresponding to the second part ofthe barcode sequence can joined to the second portion of the RNA strand.Additionally, the second HDR template can include other usefulsequences. For example, the second HDR template can include a targetregion that can be utilized as an insertion region in an HDR operation.Further, the second HDR template can include a region that is homologousto a portion of an insertion site of a polynucleotide that is utilizedin an HDR process.

At 1206, the process 1200 includes annealing the first HDR template tothe first portion of the RNA strand and the second HDR template to thesecond portion of the RNA strand to produce a modified RNA strand. Inparticular, the section of the first HDR template complementary to thefirst portion of the RNA strand can be annealed, while the section ofthe second HDR template complementary to the second portion of the RNAstrand can be annealed.

At 1208, the process 1200 includes joining a 5′ end of the first HDRtemplate to a 3′ end of the second HDR template to produce a third HDRtemplate. In various implementations, the first portion of the RNAstrand and the second portion of the RNA strand can be adjacent to eachother. Accordingly, when a section of the first HDR template and asection of the second HDR template are annealed to respective portionsof the RNA strand, the 5′ end of the first HDR template and the 3′ endof the second HDR template can be proximate to one another. A ligase canthen be utilized to join the 5′ end of the first HDR template and the 3′end of the second HDR template. In some cases, an RNA ligase can beutilized, while in other situations, a DNA ligase can be utilized. Insome illustrative examples, a ligase used to join the 5′ end of thefirst HDR template to the 3′ end of the second HDR template can includea T4 RNA ligase, such as T4 RNA Ligase 1 or T4 RNA Ligase 2, Deinoccusradiodurans RNA ligase, bacteriophage T4 DNA ligase.

At 1210, the process 1200 includes inserting a portion of the third HDRtemplate into a target site of a polynucleotide using homology directedrepair to produce a modified double stranded polynucleotide. The portionof the third HDR template can be inserted into a section of thepolynucleotide by bringing the third HDR template in contact with thepolynucleotide. In particular, the portions of the third HDR templatethat are complementary to target site of the polynucleotide can becontacted. The section of the polynucleotide that the third HDR templateis inserted into can be a target site that includes a cut site. A DSBcan be created at the cut site using an enzyme, such as a nuclease. Thepolynucleotide can include genomic DNA or artificial DNA. Also, in somecases, the polynucleotide can include linear DNA or circular DNA beforethe insertion of the third HDR template into the target site.

In particular implementations, to design the sequence of the third HDRtemplate, RNA sequences produced during the translation of a particulargene can be analyzed and certain regions of the mRNA can be determinedthat uniquely identify the gene. A portion of an mRNA sequence thatuniquely identifies the gene can be from 10 nucleotides to 120nucleotides, from 20 nucleotides to 100 nucleotides, or 25 nucleotidesto 80 nucleotides. The first HDR template and the second HDR templatecan be designed such that a portion of the first HDR template iscomplementary to a first part of the unique RNA sequence and a portionof the second HDR template is complementary to a second part of theunique RNA sequence. In this way, when joined in the third HDR template,a portion of the first HDR template and a portion of the second HDRtemplate can comprise a complete barcode sequence that identifies RNAthat is produced during expression of the gene.

Further, an additional portion of the first HDR template can be designedto be complementary to a first portion of an insertion site of apolynucleotide and an additional portion of the second HDR template canbe designed to be complementary to a second portion of an insertion siteof the polynucleotide. In this way, the portions of the first HDRtemplate and the second HDR template that are complementary to theinsertion site can be joined to the polynucleotide, which enables thebarcode sequence to be inserted into the polynucleotide at the insertionsite using HDR.

The first HDR template, the second HDR template, and the third HDRtemplate can also be designed with respect to their viability in anenvironment. In certain situations, the environment can include a cellsubjected to a set of conditions, such as a temperature range, a pH, andthe like. Additionally, the first HDR template, the second HDR template,and the third HDR template can be designed with respect to the strengthof the attachment to the RNA strand and the polynucleotide utilized inthe HDR operation. In this way, the third HDR template can be separatedfrom the RNA strand after portions of the first HDR template and thesecond HDR template that are complementary to the target site of thepolynucleotide are joined to the polynucleotide.

In an illustrative example, the RNA strand can be mRNA having a sequence-A1-A2-A3-A4-, where A1 and A4 can be sequences of hundreds or thousandsof nucleotides and A2 and A3 are sequences that include from 10 nt to 40nt. A2 and A3 can together comprise a barcode sequence for a gene thatproduces the mRNA during expression. Additionally, the first HDRtemplate can have a sequence X-A2′ and the second HDR template can havea sequence A3′-YY-X with the third HDR template having the sequenceX-A2′-A3′-YY-X. In this example, X is a sequence that is complementaryto a portion of an insertion site on a polynucleotide in which thebarcode sequence can be inserted. Also, YY is a sequence that canprovide an additional insertion site once the third HDR template isinserted into the polynucleotide.

At 1212, the process 1200 includes determining that the gene has beenexpressed based at least partly on sequencing data of the modifiedpolynucleotide. In particular, the modified polynucleotide can besequenced to produce sequencing data and the sequencing data can beanalyzed. That is, the sequencing data can be compared to the barcodesequence and, upon determining that a portion of the sequence datacorresponds to the barcode sequence, a determination can be made thatthe gene has been expressed. This is because insertion of the barcodesequence into the polynucleotide occurs a result of the production ofthe RNA strand that binds the third HDR template during the expressionof the gene.

Furthermore, although the process 1200 has been described with respectto ligation of the 5′ end of the first HDR template and the 3′ end ofthe second HDR template to produce the third HDR template for subsequentinsertion into the target site of the polynucleotide, other methods canbe utilized to produce the third HDR template. For example, the RNAstrand can enable the first HDR template and the second HDR template toserve as sequence and ligation independent cloning (SLIC) templatesduring insertion of the barcode sequence into the polynucleotide.

Illustrative System and Computing Devices

FIG. 13 shows a system 1300 for designing barcode sequences andutilizing the barcode sequences to identify the expression of a gene.The architecture may include any of a digital computer 1302, anoligonucleotide synthesizer 1304, an automated system 1306, and/or apolynucleotide sequencer 1308. The system 1300 may also include othercomponents besides those discussed herein.

As used herein, “digital computer” means a computing device including atleast one hardware microprocessor 1310 and memory 1312 capable ofstoring information in a binary format. The digital computer 1302 may bea supercomputer, a server, a desktop computer, a notebook computer, atablet computer, a game console, a mobile computer, a smartphone, or thelike. The hardware microprocessor 1310 may be implemented in anysuitable type of processor such as a single core processor, a multicoreprocessor, a central processing unit (CPU), a graphical processing unit(GPU), or the like. The memory 1312 may include removable storage,non-removable storage, local storage, and/or remote storage to providestorage of computer readable instructions, data structures, programmodules, and other data. The memory 1312 may be implemented ascomputer-readable media. Computer-readable media includes, at least, twotypes of media, namely computer-readable storage media andcommunications media. Computer-readable storage media includes volatileand non-volatile, removable and non-removable media implemented in anymethod or technology for storage of information such ascomputer-readable instructions, data structures, program modules, orother data. Computer-readable storage media includes, but is not limitedto, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM,digital versatile disks (DVD) or other optical storage, magneticcassettes, magnetic tape, magnetic disk storage or other magneticstorage devices, or any other non-transmission medium that can be usedto store information for access by a computing device.

In contrast, communications media may embody computer readableinstructions, data structures, program modules, or other data in amodulated data signal, such as a carrier wave, or other transmissionmechanism. As defined herein, computer-readable storage media andcommunications media are mutually exclusive.

The digital computer 1302 may also include one or more input/outputdevices(s) 1314 such as a keyboard, a pointing device, a touchscreen, amicrophone, a camera, a display, a speaker, a printer, and the like.

An HDR template designer 1316 may be included as part of the digitalcomputer 1302, for example, as instructions stored in the memory 1312.The HDR template designer 1316 may design HDR templates based onsequences of target sites, sequences of dsDNA molecules, enzymerecognition sites, etc. In one implementation, the HDR template designer1316 may design HDR templates to avoid cross talk between differentsignal recording pathways. The HDR template designer 1316 may alsocompare percent similarity and hybridization conditions for potentialHDR templates as well as portions of the HDR templates. For example, theHDR template designer 1316 may design HDR templates to avoid theformation of hairpins as well as to prevent or minimize annealingbetween HDR templates. The HDR template designer 1316 may also designHDR templates to maximize a difference between the 3′-end sequence,5′-end sequence, and/or middle sequence. For example, the difference maybe G:C content and the HDR template designer 1316 may design sequenceswith a preference for increasing the G:C content difference between theend sequences and the middle sequence. The HDR template designer 1316can also generate barcode sequences and splicing sequences to include inHDR templates. In some cases, a table indicating the individual barcodesequences that correspond to each gene can be stored in the memory 1312and utilized to determine that a gene has been expressed after analyzingthe sequence data 1320.

A sequence data analyzer 1318 may analyze sequence data 1320 generatedby the polynucleotide sequencer 1308. The sequence data analyzer 1318may be implemented as instructions stored in the memory 1312. Thus,sequence data 1320 may be provided to the sequence data analyzer 1318which analyzes the sequence data 1320 to identify any barcode sequencesincluded in the sequence data. The sequence data analyzer 1318 may alsoidentify which signals were detected by a cell 1322 and may identifytiming indicators or barcode sequences included in the DNA of the cell1322. Depending on the design of the cell 1322, the sequence dataanalyzer 1318 may also identify a signal strength, relative signalstrength, order of different signals, signal duration, timing ofsignals, or other characteristic of one or more signals represented inthe sequence data 1320. As used herein, “cell” includes biologicalcells, minimal cells, artificial cells, and synthetic cells.

In order to manipulate the DNA and potentially RNA that makes up the HDRtemplates and dsDNA, the digital computer 1302 may communicate withother devices through one or more I/O data interfaces 1324. The I/O datainterface(s) 1324 can exchange instructions and data with other devicessuch as the oligonucleotide synthesizer 1304, the automated system 1306,and the polynucleotide sequencer 1308.

The oligonucleotide synthesizer 1304 chemically synthesizesoligonucleotides based on instructions received as electronic data. Thesynthesized oligonucleotides may be used as HDR templates, as dsDNAmolecules that provide target sites, as plasmids, vectors, or othercomponents. Thus, in some implementations, the sequence of nucleotideswhich is provided to the oligonucleotide synthesizer 1304 may come fromthe HDR template designer 1316.

A number of methods for DNA synthesis and commercial oligonucleotidesynthesizers are available. Methods for DNA synthesis includesolid-phase phosphoramidite synthesis, microchip-based oligonucleotidesynthesis, ligation-mediated assembly, PCR-mediated assembly, and thelike. For example, such synthesis can be performed using an ABI 394 DNASynthesizer (Applied Biosystems, Foster City, Calif.) in 0.2 μmol scalefollowed by standard cleavage and deprotection protocol, e.g., using 28%aqueous ammonia or a 3:1 solution of ammonia in methanol. One havingordinary skill in the art can select other cleaving agents, such asmethylamine, to be used instead of, or in addition to, ammonia, ifdesired.

The term “oligonucleotide” as used herein is defined as a moleculeincluding two or more nucleotides. Oligonucleotides include probes andprimers. Oligonucleotides used as probes or primers may also includenucleotide analogues such as phosphorothioates, alkylphosphorothioates,peptide nucleic acids, or intercalating agents. The introduction ofthese modifications may be advantageous in order to positively influencecharacteristics such as hybridization kinetics, reversibility of thehybrid-formation, stability of the oligonucleotide molecules, and thelike.

The automated system 1306 may include any type of robotics, automation,or other system for automating one or more manipulations that may beperformed on the dsDNA with the enzymes and/or the HDR templates. Theautomated system 1306 may be used in conjunction with manual operationssuch that the totality of operations needed to be performed to practicethe techniques of this disclosure are done so in a hybrid manner inwhich some are performed by the automated system 1306 and othersmanually.

In one implementation, the automated system 1306 may include amicrofluidics system. An illustrative microfluidics system may beconfigured to move small volumes of liquid according to techniqueswell-understood by those of ordinary skill in the art. As used herein,the automated system 1306 may include other equipment for manipulatingDNA beyond that expressly shown in FIG. 13 such as, for example, athermocycler.

The automated system 1306 may include a cell-free system that can beimplemented in part by microfluidics. The cell-free system may also beimplemented as an artificial cell or a minimal cell. As used herein theterm “cell” encompasses natural cells, artificial cells, and minimalcells unless context clearly indicates otherwise. The automated system1306 may include one or more natural cells such as a cell in culture. Aculture of cells in the automated system 1306 may be manipulated by anautomated cell culture system. An artificial cell or minimal cell is anengineered particle that mimics one or many functions of a biologicalcell. Artificial cells are biological or polymeric membranes whichenclose biologically active materials. As such, nanoparticles,liposomes, polymersomes, microcapsules, detergent micelles, and a numberof other particles may be considered artificial cells.Micro-encapsulation allows for metabolism within the membrane, exchangeof small molecules and prevention of passage of large substances acrossit. Membranes for artificial cells can be made of simple polymers,crosslinked proteins, lipid membranes or polymer-lipid complexes.Further, membranes can be engineered to present surface proteins such asalbumin, antigens, Na/K-ATPase carriers, or pores such as ion channels.Commonly used materials for the production of membranes include hydrogelpolymers such as alginate, cellulose and thermoplastic polymers such ashydroxyethyl methacrylate-methyl methacrylate (HEMA-MMA),poly-acrylonitrile-polyvinyl chloride (PAN-PVC), as well as variationsof the above-mentioned materials.

Minimal cells, also known as proto-cells, are cells that help all theminimum requirements for life. Minimal cells may be created by atop-down approach that knocks out genes in a single-celled organismuntil a minimal set of genes necessary for life are identified.Mycoplasma mycoides, E. coli, and Saccharomyces cerevisiae, are examplesof organisms that may be modified to create minimal cells. One ofordinary skill in the art will recognize multiple techniques forgenerating minimal cells.

The cell-free system includes components for DNA replication and repairsuch as nucleotides, DNA polymerase, and DNA ligase. The cell-freesystem will also include dsDNA that includes at least one initial targetsite for creating a DSB. The dsDNA may be present in the vector thatincludes one or more operons. The cell-free system will also includebuffers to maintain pH and ion availability. Furthermore, the cell-freesystem may also include the enzymes used for creating DSBs in dsDNA andthe HDR templates used for repairing dsDNA. Some cell-free systems mayinclude genes encoding the enzymes and HDR templates. To prevent enzymesfrom remaining when their respective cutting functions are no longerdesired, the cell-free system may include proteolytic enzymes thatspecifically break down nucleases.

In a cell-free system, particular components may be added when neededeither by moving volumes of liquid together with microfluidics or byincreasing the expression of gene products that leads to synthesis ofenzymes, HDR templates, etc.

The automated system 1306 may include a structure, such as at least onechamber, which holds one or more DNA molecules. The chamber may beimplemented as any type of mechanical, biological, or chemicalarrangement which holds a volume of liquid, including DNA, to a physicallocation. For example, a single flat surface having a droplet presentthereon, with the droplet held by surface tension of the liquid, eventhough not fully enclosed within a container, is one implementation of achamber.

The automated system 1306 may perform many types of manipulations on DNAmolecules. For example, the automated system 1306 may be configured tomove a volume of liquid from one chamber to another chamber in responseto a series of instructions from the I/O data interface 1324.

The polynucleotide sequencer 1308 may sequence DNA molecules using anytechnique for sequencing polynucleotides known to those skilled in theart including classic dideoxy sequencing reactions (Sanger method),sequencing by synthesis using reversibly terminated labeled nucleotides,pyrosequencing, nanopore sequencing, SOLiD sequencing,chemical-sensitive field effect transistor (chemFET) sequencing, and ionsemiconductor sequencing. The polynucleotide sequencer 1308 may beconfigured to sequence all or part of a dsDNA molecule modifiedaccording to any of the techniques described above and provide thesequence data 1320 to the digital computer 1302.

The cell 1322 may be prepared for sequencing by extracting nucleic acidsaccording to standard methods in the art. For example, DNA from a cellcan be isolated using various lytic enzymes, chemical solutions, orextracted by nucleic acid binding resins following instructions providedby a manufacturer. DNA contained in extracted sample may be detected byamplification procedures such as PCR or hybridization assays accordingto methods widely known in the art. Furthermore, RNA can be detected andanalyzed using techniques, such as single molecule fluorescent in situhybridization (smFISH) techniques.

The sequence data 1320 generated by sequencing can be sent from thepolynucleotide sequencer 1308 to the digital computer 1302 for analysisby the sequence data analyzer 1318, and also for presentation on anoutput device 1314.

Illustrative Site-Specific Nucleases

Restriction enzymes (restriction endonucleases) are present in manyspecies and are capable of sequence-specific binding to DNA (at a targetor recognition site), and cleaving DNA at or near the site of binding.Over 3000 restriction enzymes have been studied in detail, and more than600 of these are available commercially. Naturally occurring restrictionendonucleases are categorized into four groups (Types I, II III, and IV)based on their composition and enzyme cofactor requirements, the natureof their target site, and the position of their DNA cleavage siterelative to the target site. All types of enzymes recognize specificshort DNA sequences and carry out the endonucleolytic cleavage of DNA togive specific fragments with terminal 5′-phosphates. One type ofrestriction enzyme, Type II enzymes, cleave within or at short specificdistances from a recognition site; most require magnesium; singlefunction (restriction) enzymes independent of methylase. Type II enzymesform homodimers, with recognition sites that are usually undivided andpalindromic and 4-8 nucleotides in length. They recognize and cleave DNAat the same site, and they do not use ATP or AdoMet for theiractivity—they usually require only Mg²⁺ as a cofactor. Common type IIrestriction enzymes include HhaI, HindIII, NotI, EcoRI, and BglI.Restriction enzymes may cut dsDNA in a way that leaves either blunt endsor sticky ends. Protocols for creating a DSB in dsDNA with restrictionenzymes are well known to those skilled in the art. Restriction digestis a common molecular biology technique and is typically performed usingthe reagents and protocols provided in a commercially availablerestriction digest kit. Examples of companies that provide restrictiondigest kits include New England BioLabs, Promega, Sigma-Aldrich, andThermo Fisher Scientific. Each of these companies provides restrictiondigest protocols on their website.

Homing endonucleases (HEs), which are also known as meganucleases, are acollection of double-stranded DNases that have large, asymmetricrecognition sites (12-40 nt) and coding sequences that are usuallyembedded in either introns or inteins. Introns are spliced out ofprecursor RNAs, while inteins are spliced out of precursor proteins.They catalyze the hydrolysis of genomic DNA within the cells thatsynthesize them, but do so at few, or even a single, location(s) pergenome. HE recognition sites are extremely rare. For example, an 18 ntrecognition sequence will occur only once in every 7×10¹⁰ nucleotides ofrandom sequence. This is equivalent to only one site in 20mammalian-sized genomes. However, unlike restriction endonucleases, HEstolerate some sequence degeneracy within their recognition sequence.Thus, single base changes do not abolish cleavage but reduce itsefficiency to variable extents. As a result, their observed sequencespecificity is typically in the range of 10-12 nt. Examples of suitableprotocols using HEs may be found in Flick, K. et al., DNA Binding inCleavage by the Nuclear Introns-Encoded Homing Endonuclease I-Ppol, 394Nature 96 (1998) and Chevalier, B. et al., Design, Activity, andStructure of a Highly Specific Artificial Endonuclease, 10 MolecularCell 895 (2002).

Zinc finger nucleases (ZFNs) are synthetic proteins consisting of anengineered zinc finger DNA-binding domain fused to the cleavage domainof the FokI restriction endonuclease. ZFNs can be used to induce DSBs inspecific DNA sequences and thereby promote site-specific homologousrecombination and targeted manipulation of genomic loci in a variety ofdifferent cell types. The introduction of a DSB into dsDNA may enhancethe efficiency of recombination with an exogenously introduced HDRtemplate. ZFNs consist of a DNA-binding zinc finger domain (composed ofthree to six fingers) covalently linked to the non-specific DNA cleavagedomain of the bacterial FokI restriction endonuclease. ZFNs can bind asdimers to their target DNA sites, with each monomer using its zincfinger domain to recognize a half-site. Dimerization of ZFNs is mediatedby the FokI cleavage domain which cleaves within a five or sixnucleotide “spacer” sequence that separates the two inverted “halfsites.” Because the DNA-binding specificities of zinc finger domains canin principle be re-engineered using one of various methods, customizedZFNs can be constructed to target nearly any DNA sequence. One ofordinary skill in the art will know how to design and use ZFNs to createDSBs in dsDNA at a desired target site. Some suitable protocols areavailable in Philipsborn, A. et al., Microcontact printing of axonguidance molecules for generation of graded patterns, 1 Nature Protocols1322 (2006); John Young and Richard Harland, Targeted Gene Disruptionwith Engineered Zinc Finger Nucleases (ZFNs), 917 Xenopus Protocols 129(2012), and Hansen, K. et al. Genome Editing with CompoZr Custom ZincFinger Nucleases (ZFNs), 64 J. Vis. Exp. 3304 (2012).

TALENs are restriction enzymes that can be engineered to cut specificsequences of DNA. They are made by fusing a TAL effector DNA-bindingdomain to a DNA cleavage domain (i.e., a nuclease which cuts DNAstrands). Transcription activator-like effectors (TALEs) can beengineered to bind practically any desired DNA sequence, so whencombined with a nuclease, DNA can be cut at specific locations. Therestriction enzymes can be introduced into cells, for use in geneediting or for genome editing in situ. The DNA binding domain contains arepeated highly conserved 33-34 amino acid sequence with divergent12^(th) and 13^(th) amino acids. These two positions, referred to as theRepeat Variable Diresidue (RVD), are highly variable and show a strongcorrelation with specific nucleotide recognition. This straightforwardrelationship between amino acid sequence and DNA recognition has allowedfor the engineering of specific DNA-binding domains by selecting acombination of repeat segments containing the appropriate RVDs. Notably,slight changes in the RVD and the incorporation of “nonconventional” RVDsequences can improve targeting specificity. One of ordinary skill inthe art will know how to design and use TALENs to create DSBs in dsDNAat a desired target site. Some suitable protocols are available inHermann, M. et al., Mouse Genome Engineering Using Designer Nucleases,86 J. Vis. Exp. 50930 (2014) and Sakuma, T. et al., Efficient TALENConstruction and Evaluation Methods for Human Cell and AnimalApplications, 18(4) Genes Cells 315 (2013).

In the CRISPR/Cas nuclease system, the CRISPR locus, encodes RNAcomponents of the system, and the Cas (CRISPR-associated) locus, encodesproteins. CRISPR loci in microbial hosts contain a combination ofCRISPR-associated (Cas) genes as well as non-coding RNA elements capableof programming the specificity of the CRISPR-mediated polynucleotidecleavage.

The Type II CRISPR is one of the most well characterized systems andcarries out targeted double-stranded breaks in four sequential steps.First, two non-coding RNAs, the pre-crRNA array and tracrRNA, aretranscribed from the CRISPR locus. Second, tracrRNA hybridizes to therepeat regions of the pre-crRNA and mediates the processing of pre-crRNAinto mature crRNAs containing individual spacer sequences. Third, themature crRNA:tracrRNA complex directs Cas9 to the target DNA viaWatson-Crick base-pairing between the spacer on the crRNA and theprotospacer on the target DNA next to the protospacer adjacent motif(PAM), an additional requirement for target recognition. In engineeredCRISPR/Cas9 systems, gRNA also called single-guide RNA (“sgRNA”) mayreplace crRNA and tracrRNA with a single RNA construct that includes theprotospacer element and a linker loop sequence. Standard Watson-Crickbase-pairing includes: adenine (A) pairing with thymidine (T), adenine(A) pairing with uracil (U), and guanine (G) pairing with cytosine (C).In addition, it is also known in the art that for hybridization betweentwo RNA molecules (e.g., dsRNA), guanine (G) base pairs with uracil (U).In the context of this disclosure, a guanine (G) is consideredcomplementary to a uracil (U), and vice versa. As such, when a G/Ubase-pair can be made at a given nucleotide position a protein-bindingsegment (dsRNA duplex) of a subject DNA-targeting RNA molecule, theposition is not considered to be non-complementary, but is insteadconsidered to be complementary. Use of gRNA may simplify the componentsneeded to use CRISPR/Cas9 for genome editing. The Cas9 species ofdifferent organisms have different PAM sequences. For example,Streptococcus pyogenes (Sp) has a PAM sequence of 5′-NGG-3′,Staphylococcus aureus (Sa) has a PAM sequence of 5′-NGRRT-3′ or5′-NGRRN-3′, Neisseria meningitidis (NM) has a PAM sequence of5′-NNNNGATT-3′, Streptococcus thermophilus (St) has a PAM sequence of5′-NNAGAAW-3′, Treponema denticola (Td) has a PAM sequence of5′-NAAAAC-3′.

Finally, Cas9 mediates cleavage of target DNA to create a DSB within theprotospacer. Activity of the CRISPR/Cas system in nature comprises threesteps: (i) insertion of alien DNA sequences into the CRISPR array toprevent future attacks, in a process called ‘adaptation,’ (ii)expression of the relevant proteins, as well as expression andprocessing of the array, followed by (iii) RNA-mediated interferencewith the alien polynucleotide. The alien polynucleotides come fromviruses attaching the bacterial cell. Thus, in the bacterial cell,several of the so-called ‘Cas’ proteins are involved with the naturalfunction of the CRISPR/Cas system and serve roles in functions such asinsertion of the alien DNA, etc.

CRISPR may also function with nucleases other than Cas9. Two genes fromthe Cpf1 family contain a RuvC-like endonuclease domain, but they lackCas9's second HNH endonuclease domain. Cpf1 cleaves DNA in a staggeredpattern and requires only one RNA rather than the two (tracrRNA andcrRNA) needed by Cas9 for cleavage. Cpf1's preferred PAM is 5′-TTN,differing from that of Cas9 (3′-NGG) in both genomic location andGC-content. Mature crRNAs for Cpf1-mediated cleavage are 42-44nucleotides in length, about the same size as Cas9's, but with thedirect repeat preceding the spacer rather than following it. The Cpf1crRNA is also much simpler in structure than Cas9's; only a shortstem-loop structure in the direct repeat region is necessary forcleavage of a target. Cpf1 also does not require an additional tracrRNA.Whereas Cas9 generates blunt ends 3 nt upstream of the PAM site, Cpf1cleaves in a staggered fashion, creating a five nucleotide 5′ overhang18-23 nt away from the PAM.

Other CRISPR-associated proteins besides Cas9 may be used instead ofCas9. For example, CRISPR-associated protein 1 (Cas1) is one of the twouniversally conserved proteins found in the CRISPR prokaryotic immunedefense system. Cas1 is a metal-dependent DNA-specific endonuclease thatproduces double-stranded DNA fragments. Cas1 forms a stable complex withthe other universally conserved CRISPR-associated protein, Cas2, whichis part of spacer acquisition for CRISPR systems.

There are also CRISPR/Cas9 variants that do not use a PAM sequence suchas NgAgo. NgAgo functions with a 24-nucleotide ssDNA guide and isbelieved to cut 8-11 nucleotides from the start of this sequence. ThessDNA is loaded as the protein folds and cannot be swapped to adifferent guide unless the temperature is increased to non-physiological55° C. A few nucleotides in the target DNA are removed near the cutsite. Techniques for using NgAgo are described in Gao, F. et al.,DNA-guided Genome Editing Using the Natronobacterium Gregoryi Argonaute,34 Nature Biotechnology 768 (2016).

DSBs may be formed by making two single-stranded breaks at differentlocations creating a cut DNA molecule with sticky ends. Single-strandbreaks or “nicks” may be formed by modified versions of the Cas9 enzymecontaining only one active catalytic domain (called “Cas9 nickase”).Cas9 nickases still bind DNA based on gRNA specificity, but nickases areonly capable of cutting one of the DNA strands. Two nickases targetingopposite strands are required to generate a DSB within the target DNA(often referred to as a “double nick” or “dual nickase” CRISPR system).This requirement dramatically increases target specificity, since it isunlikely that two off-target nicks will be generated within close enoughproximity to cause a DSB. Techniques for using a dual nickase CRISPRsystem to create a DSB are described in Ran, et al., Double Nicking byRNA-Guided CRISPR Cas9 for Enhanced Genome Editing Specificity, 154 Cell6:1380 (2013).

In certain embodiments, any of the enzymes described in this disclosuremay be a “functional derivative” of a naturally occurring protein. A“functional derivative” of a native sequence polypeptide is a compoundhaving a qualitative biological property in common with a nativesequence polypeptide. “Functional derivatives” include, but are notlimited to, fragments of a native sequence and derivatives of a nativesequence polypeptide and its fragments, provided that they have abiological activity in common with a corresponding native sequencepolypeptide. A biological activity contemplated herein is the ability ofthe functional derivative to hydrolyze a DNA substrate into fragments.The term “derivative” encompasses both amino acid sequence variants ofpolypeptide, covalent modifications, and fusions thereof. Suitablederivatives of an enzyme or a fragment thereof include but are notlimited to mutants, fusions, covalent modifications of the protein or afragment thereof. The enzyme, or a fragment thereof, as well asderivatives or a fragment thereof, may be obtainable from a cell orsynthesized chemically or by a combination of these two procedures. Thecell may be a cell that naturally produces the enzyme. A cell thatnaturally produces enzyme may also be genetically engineered to producethe endogenous enzyme at a higher expression level or to produce theenzyme from an exogenously introduced polynucleotide, whichpolynucleotide encodes an enzyme that is the same or different from theendogenous enzyme. In some cases, a cell does not naturally produce theenzyme and is genetically engineered to produce the enzyme. Theengineering may include adding the polynucleotide encoding the enzymeunder the control of a promoter. The promoter may be an induciblepromoter that is activated in response to a signal. The promoter mayalso be blocked by a different signal or molecule.

ILLUSTRATIVE EMBODIMENTS

The following clauses described multiple possible embodiments forimplementing the features described in this disclosure. The variousembodiments described herein are not limiting nor is every feature fromany given embodiment required to be present in another embodiment. Anytwo or more of the embodiments may be combined together unless contextclearly indicates otherwise. As used herein in this document, “or” meansand/or. For example, “A or B” means A without B, B without A, or A andB. As used herein, “comprising” means including all listed features andpotentially including addition of other features that are not listed.“Consisting essentially of” means including the listed features andthose additional features that do not materially affect the basic andnovel characteristics of the listed features. “Consisting of” means onlythe listed features to the exclusion of any feature not listed.

Clause A. A method comprising: producing a first homology directedrepair (HDR) template including at least a first splicing region and abarcode region, the first splicing region including a first sequence ofnucleotides that is recognized by an enzyme to produce a cut in thefirst splicing region and the barcode region including a sequence ofnucleotides that corresponds to a gene; inserting the first HDR templateinto a target site of the gene using HDR; splice, using the enzyme, thefirst HDR template in at least the first splicing region to produce asecond HDR template, the second HDR template including a sequence ofnucleotides that includes a portion of the first splicing region and thebarcode region; inserting the second HDR template into a double strandedpolynucleotide using HDR; sequencing the additional double strandedpolynucleotide to produce sequencing data; and determining that the genehas been expressed based at least partly on identifying the sequence ofnucleotides of the barcode region in the sequencing data.

Clause B. The method of claim A, wherein: the first HDR templateincludes a second splicing region; the first splicing region ishomologous to a first portion of a target site of the gene; and thesecond splicing region is homologous to a second portion of the targetsite of the gene.

Clause C. The method of clause A or B, wherein the first HDR template isinserted in the 3′ untranslated region of the gene.

Clause D. The method of any one of clauses A-C, wherein the doublestranded polynucleotide is at least one of genomic DNA, artificial DNA,circular DNA, or linear DNA.

Clause E. The method of any one of clauses A-D, wherein the enzyme is aspliceosome, and the method further comprises designing the first HDRtemplate such that the first splicing region includes the sequence ofnucleotides recognized by the spliceosome and the second HDR templateremains viable to perform HDR with the double stranded polynucleotidefor a specified period of time.

Clause F. The method of any one of clauses A-E, further comprising:before inserting the first HDR template into the target site, insertinga third HDR template into the gene using HDR, wherein the third HDRtemplate includes the target site.

Clause G. The method of any one of clauses A-F, further comprising:generating data indicating a plurality of barcode sequences, wherein thegene is one of a plurality of genes; and associating individual genes ofthe plurality of genes with a respective barcode sequence of theplurality of barcode sequences such that each barcode sequence of theplurality of barcode sequences corresponds to a particular gene.

Clause H. The method of any one of clauses A-G, further comprisingproducing a gene product as a result of the expression of the gene,wherein: the gene product includes a single stranded polynucleotidesequence that includes a first section corresponding to the firstsplicing region and a second section corresponding to the barcoderegion.

Clause I. A system comprising: a gene including a double strandedpolynucleotide having a target site; an enzyme configured to create adouble strand break in the double stranded polynucleotide of the gene ata cut site in the target site; and a HDR template including at least afirst splicing region and a barcode sequence corresponding to the gene;wherein the HDR template is inserted into the target site with homologydirected repair (HDR) at the cut site after the enzyme creates a breakat the cut site.

Clause J. The system of clause I, wherein the system comprises a singleeukaryotic cell or a single prokaryotic cell.

Clause K. The system of clause I or J, further comprising an additionaldouble stranded polynucleotide including an additional target site.

Clause L. The system of clause K, wherein: at least a portion of thefirst HDR template is removed from the double stranded polynucleotide ofthe gene using at least one spliceosome to produce a second HDR templatethat includes at least the barcode sequence and a portion of the firstsplicing region.

Clause M. The system of any one of clauses I-L, wherein expression ofthe gene produces an RNA precursor that includes a single strandedpolynucleotide including: a first sequence that corresponds to the firstsplicing region; a second sequence that corresponds to the barcodesequence; a 3′ untranslated region (UTR) and a 5′ UTR; and a codingregion that includes an intron and a exon.

Clause N. The system of clause M, wherein the intron included in the RNAprecursor includes the HDR template.

Clause O. The system of clause M, wherein the 3′ UTR includes the firstsequence and the second sequence.

Clause P. A system comprising: a gene; a double stranded polynucleotideincluding a target site; a homology directed repair (HDR) templateincluding a barcode region having a sequence of nucleotides thatcorresponds to the gene; and an enzyme configured to create a doublestrand break in the double stranded polynucleotide at the target site;wherein the HDR template is inserted into the double strandedpolynucleotide by HDR to produce a modified double strandedpolynucleotide.

Clause Q. The system of clause P, wherein: the modified double strandedpolynucleotide includes an additional target site; the system furthercomprises an additional HDR template; and the additional HDR template isinserted into the additional target site via HDR.

Clause R. The system of clause Q, wherein: the system further comprisinga first gene encoding the HDR template and a second gene encoding theadditional HDR template; expression of the first gene causes the HDRtemplate to become available for insertion into the target site; andexpression of the second gene causes the additional HDR template tobecome available for insertion into the additional target site.

Clause S. The system of clause R, wherein: the second gene is expressedin response to a signal that occurs at a particular time; and analysisof a sequence of the modified double stranded polynucleotide indicates aperiod of time that the first gene was expressed based at least partlyon the presence of the additional HDR template in the sequence of themodified double stranded polynucleotide.

Clause T. The system of any one of clauses P-S, further comprising: anadditional gene that includes an additional HDR template having asequence that includes the sequence of nucleotides of the barcode regionand at least one splicing region; and an additional enzyme to remove atleast a portion of the additional HDR template to create the HDRtemplate and make the HDR template available for insertion into thedouble stranded polynucleotide

Clause U. A method comprising: producing a first homology directedrepair (HDR) template including a region complementary to a firstportion of an RNA strand, wherein the RNA strand is produced fromexpression of a gene; producing a second HDR template including a regioncomplementary to a second portion of the RNA strand; annealing the firstHDR template to the first portion of the RNA strand and the second HDRtemplate to the second portion of the RND strand to produce a modifiedRNA strand; joining a 5′ end of the first HDR template and a 3′ end ofthe second HDR template to produce a third HDR template; inserting thethird HDR template into a target site of a polynucleotide using HDR toproduce a modified polynucleotide; and determining that the gene hasbeen expressed based at least partly ono sequencing data of the modifiedpolynucleotide.

Clause V. The method of clause U, wherein the RNA strand is messengerRNA (mRNA) produced during the expression of the gene.

Clause W. The method of Clause U or V, wherein the third HDR template isannealed to the RNA strand as the portion of the third HDR template isbeing inserted into the target site.

Clause X. The method of Clause U or V, wherein the third HDR template isseparated from the RNA strand as the portion of the third HDR templateis being inserted into the target site.

Clause Y. A system comprising: a gene; a first homology directed repair(HDR) template including a first portion of a barcode sequence; and asecond HDR template including a second portion of a barcode sequence;wherein the gene produces an RNA strand during the expression of thegene and a first region of the first HDR template anneals to acomplementary first region of the RNA strand and a second region of thesecond HDR template anneals to a complementary second region of the RNAstrand.

Clause Z. The system of clause Y, wherein a hybridized product of thefirst HDR template, the second HDR template, and the RNA strand forms atemplate for a third HDR template.

Clause AA. The system of clause Y or Z, wherein the first region of theRNA strand is adjacent to the second region of the RNA strand.

Clause BB. The system of any one of clauses Y-AA, wherein a 5′ end ofthe first HDR template is joined to a 3′ end of the second HDR template.

Clause CC. The system of any one of clauses Y-BB, further comprising apolynucleotide include a target region.

Clause DD. The system of clause CC, wherein the first HDR templateincludes a first sequence separate from the first region that iscomplementary to a first portion of the target region and the second HDRtemplate includes a second sequence separate from the second region thatis complementary to a second portion of the target region.

Clause EE. The system of clause DD, further comprising an enzyme tocreate a double strand break (DSB) at a cut site of the target region ofthe polynucleotide; and wherein the barcode sequence is inserted intothe polynucleotide at the cut site to produce a modified polynucleotideusing HDR.

CONCLUSION

The terms “a,” “an,” “the” and similar referents used in the context ofdescribing the invention (especially in the context of the followingclaims) are to be construed to cover both the singular and the plural,unless otherwise indicated herein or clearly contradicted by context.The term “based on” is to be construed to cover both exclusive andnonexclusive relationships. For example, “A is based on B” means that Ais based at least in part on B and may be based wholly on B. By “about”is meant a quantity, level, value, number, frequency, percentage,dimension, size, amount, weight or length that varies by as much as 10,9, 8, 7, 6, 5, 4, 3, 2 or 1% to a reference quantity, level, value,number, frequency, percentage, dimension, size, amount, weight orlength.

All methods described herein can be performed in any suitable orderunless otherwise indicated herein or otherwise clearly contradicted bycontext. The use of all examples and exemplary language (e.g., “suchas”) provided herein is intended merely to better illuminate theinvention and does not pose a limitation on the scope of the inventionotherwise claimed. No language in the specification should be construedas indicating any non-claimed element essential to the practice of theinvention.

Groupings of alternative elements or embodiments of the inventiondisclosed herein are not to be construed as limitations. Each groupmember may be referred to and claimed individually or in any combinationwith other members of the group or other elements found herein. It isanticipated that one or more members of a group may be included in, ordeleted from, a group for reasons of convenience and/or patentability.When any such inclusion or deletion occurs, the specification is deemedto contain the group as modified, thus fulfilling the writtendescription of all Markush groups used in the appended claims.

Certain embodiments are described herein, including the best mode knownto the inventors for carrying out the invention. Of course, variationson these described embodiments will become apparent to those of ordinaryskill in the art upon reading the foregoing description. Skilledartisans will know how to employ such variations as appropriate, and theembodiments disclosed herein may be practiced otherwise thanspecifically described. Accordingly, all modifications and equivalentsof the subject matter recited in the claims appended hereto are includedwithin the scope of this disclosure. Moreover, any combination of theabove-described elements in all possible variations thereof isencompassed by the invention unless otherwise indicated herein orotherwise clearly contradicted by context.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts are disclosed as example forms ofimplementing the claims.

Furthermore, references have been made to publications, patents and/orpatent applications (collectively “references”) throughout thisspecification. Each of the cited references is individually incorporatedherein by reference for their particular cited teachings as well as forall that they disclose.

ABSTRACT

Gene expression can be identified by analyzing a DNA sequence. The DNAsequence can include a barcode sequence that corresponds to a particulargene. The barcode sequence can be produced during the expression of agene by first adding a Homologous Directed Repair (HDR) templateincluding the barcode sequence into the DNA sequence of the gene andthen splicing the barcode sequence out of an RNA precursor during theexpression of the gene. As the barcode sequence is made available fromthe RNA precursor, it can be added to the DNA strand using HDR. Theresulting DNA strand can be sequenced and the sequence data can beanalyzed to identify the barcode sequence within the DNA sequence, whichprovides an indicator of the expression of the gene in DNA rather thanRNA.

1. A method for identifying expression of a gene, the method comprising:producing, in response to expression of the gene, a gene productincluding at least a first splicing region and a barcode sequence, thefirst splicing region including a first sequence of nucleotides that isrecognized by an enzyme to produce a cut in the first splicing regionand the barcode sequence including a sequence of nucleotides thatuniquely identifies the gene; splicing, using the enzyme, a homologydirected repair (HDR) template from the gene product by cutting at leastthe first splicing region, the HDR template including a sequence ofnucleotides that includes a portion of the first splicing region and thebarcode sequence; inserting the HDR template into a double strandedpolynucleotide using HDR; sequencing the double stranded polynucleotideto produce sequencing data; and determining that the gene has beenexpressed based at least partly on identifying the sequence ofnucleotides of the barcode sequence in the sequencing data.
 2. Themethod of claim 1, wherein: the gene product includes a second splicingregion; a portion of the first splicing region is homologous to a firstportion of a target site in the double stranded polynucleotide; and aportion of the second splicing region is homologous to a second portionof the target site in the double stranded polynucleotide.
 3. The methodof claim 1, wherein the barcode sequence is located in a 3′ untranslatedregion of the gene.
 4. The method of claim 1, wherein the enzyme is aspliceosome, and the method further comprises designing the gene productsuch that the first splicing region includes a sequence of nucleotidesrecognized by the spliceosome and the HDR template remains viable toperform HDR with the double stranded polynucleotide for a period oftime.
 5. The method of claim 1, further comprising: modifying the geneby inserting, via HDR, a first HDR template that adds the first splicingregion and the barcode sequence to the gene.
 6. The method of claim 1,further comprising: generating data indicating a plurality of barcodesequences, wherein the gene is one of a plurality of genes; and uniquelyassociating individual genes of the plurality of genes with a one of theplurality of barcode sequences such that each barcode sequence of theplurality of barcode sequences corresponds to only one of the individualgenes.
 7. The method of claim 1, wherein the gene product is an RNAprecursor that is a single stranded polynucleotide which includes: afirst sequence that corresponds to the first splicing region; a secondsequence that corresponds to the barcode sequence; a 3′ untranslatedregion (UTR) and a 5′ UTR; and a coding region that includes an intronand an exon.
 8. The method of claim 7, wherein the RNA precursorincludes a gene expression region that comprises the barcode sequence.9. The method of claim 8, wherein the splicing of the gene expressionregion from gene product produces the HDR template.
 10. A system foridentifying expression of a gene, the system comprising: a gene, thatwhen expressed, produces a gene product, wherein the gene productincludes at least a first splicing region and a barcode sequence, thefirst splicing region including a first sequence of nucleotides that isrecognized by a first enzyme to produce a cut in the first splicingregion and the barcode sequence including a sequence of nucleotides thatuniquely identifies the gene; the first enzyme configured to splice ahomology directed repair (HDR) template from the gene product by cuttingat least the first splicing region, wherein the HDR template includes asequence of nucleotides that includes a portion of the first splicingregion and the barcode sequence; a double stranded polynucleotideincluding a target site, wherein a first subsequence of the target sitehybridizes to a first sequence of the HDR template and a secondsubsequence of the target site hybridizes to a second sequence of theHDR template such that the double stranded polynucleotide is configuredto incorporate the HDR template using HDR; and a second enzymeconfigured to create a double strand break in the double strandedpolynucleotide at a cut site in the target site.
 11. The system of claim10, wherein the first enzyme is a spliceosome and the second enzyme is arestriction enzyme, a homing endonuclease, a zinc-finger nuclease, atranscription activator-like effector nuclease, CRISPR/Cas, or NgAgo.12. The system of claim 10, wherein the double stranded polynucleotideis at least one of genomic DNA, artificial DNA, circular DNA, or linearDNA.
 13. The system of claim 10, wherein the barcode sequence is locatedin a 3′ untranslated region of the gene.
 14. The system of claim 10,wherein: the gene product includes a second splicing region; the barcodesequence is located between the first splicing region and the secondsplicing region; a portion of the first splicing region is homologous toa first portion) of a target site; and a portion of the second splicingregion is homologous to a second portion) of the target site.
 15. Thesystem of claim 10, wherein the gene product is an RNA precursor that isa single stranded polynucleotide which includes: a first sequence thatcorresponds to the first splicing region; a second sequence thatcorresponds to the barcode sequence; a 3′ untranslated region (UTR) anda 5′ UTR; and a coding region that includes an intron and an exon. 16.The system of claim 10, wherein the double stranded polynucleotideincludes a second target site, the second target site configured toincorporate via HDR a second HDR template generated from a second geneproduct expressed by a second gene, the second HDR template including asecond barcode sequence uniquely identifies the second gene.
 17. Thesystem of claim 10, wherein the gene includes a target site; a thirdenzyme configured to create a double strand break in the gene at a cutsite in the target site; and a first HDR template configured to add thefirst splicing region and the barcode sequence to the gene by HDR. 18.The system of claim 10, further comprising a polynucleotide sequencerconfigured to sequence the double stranded polynucleotide and producesequencing data.
 19. The system of claim 18, further comprising adigital computer comprising a sequence data analyzer configure toidentify the barcode sequence in the sequencing data.
 20. The system ofclaim 19, wherein: the gene is configured to be expressed in response toa signal that occurs at a particular time; and the sequence dataanalyzer is further configured to analyze the sequencing data anddetermine a period of time that the gene was expressed based at leastpartly on the presence of the barcode sequence in the sequencing data.